From 103c483943c242eadef626f0b50496164d04daab Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Fri, 18 Dec 2020 20:02:40 +0800 Subject: [PATCH 01/43] Refactor get_started.md primarily --- docs/getting_started.md | 484 ++++++++++------------------------------ 1 file changed, 117 insertions(+), 367 deletions(-) diff --git a/docs/getting_started.md b/docs/getting_started.md index 178e8e2bb4..03b48a3fb2 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -1,255 +1,182 @@ -# Getting Started +# Prerequisites -This page provides basic tutorials about the usage of MMDetection3D. -For installation instructions, please see [install.md](install.md). +- Linux or macOS (Windows is not currently officially supported) +- Python 3.6+ +- PyTorch 1.3+ +- CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible) +- GCC 5+ +- [mmcv](https://github.com/open-mmlab/mmcv) -## Prepare datasets +# Installation -It is recommended to symlink the dataset root to `$MMDETECTION3D/data`. -If your folder structure is different from the following, you may need to change the corresponding paths in config files. +## Install mmdetection -``` -mmdetection3d -├── mmdet3d -├── tools -├── configs -├── data -│ ├── nuscenes -│ │ ├── maps -│ │ ├── samples -│ │ ├── sweeps -│ │ ├── v1.0-test -| | ├── v1.0-trainval -│ ├── kitti -│ │ ├── ImageSets -│ │ ├── testing -│ │ │ ├── calib -│ │ │ ├── image_2 -│ │ │ ├── velodyne -│ │ ├── training -│ │ │ ├── calib -│ │ │ ├── image_2 -│ │ │ ├── label_2 -│ │ │ ├── velodyne -│ ├── waymo -│ │ ├── waymo_format -│ │ │ ├── training -│ │ │ ├── validation -│ │ │ ├── testing -│ │ │ ├── gt.bin -│ │ ├── kitti_format -│ │ │ ├── ImageSets -│ ├── lyft -│ │ ├── v1.01-train -│ │ │ ├── v1.01-train (train_data) -│ │ │ ├── lidar (train_lidar) -│ │ │ ├── images (train_images) -│ │ │ ├── maps (train_maps) -│ │ ├── v1.01-test -│ │ │ ├── v1.01-test (test_data) -│ │ │ ├── lidar (test_lidar) -│ │ │ ├── images (test_images) -│ │ │ ├── maps (test_maps) -│ │ ├── train.txt -│ │ ├── val.txt -│ │ ├── test.txt -│ │ ├── sample_submission.csv -│ ├── scannet -│ │ ├── meta_data -│ │ ├── scans -│ │ ├── batch_load_scannet_data.py -│ │ ├── load_scannet_data.py -│ │ ├── scannet_utils.py -│ │ ├── README.md -│ ├── sunrgbd -│ │ ├── OFFICIAL_SUNRGBD -│ │ ├── matlab -│ │ ├── sunrgbd_data.py -│ │ ├── sunrgbd_utils.py -│ │ ├── README.md +a. Create a conda virtual environment and activate it. +```shell +conda create -n open-mmlab python=3.7 -y +conda activate open-mmlab ``` -Download KITTI 3D detection data [HERE](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). Prepare kitti data by running +b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/), e.g., + +```shell +conda install pytorch torchvision -c pytorch +``` -```bash -mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets +Note: Make sure that your compilation CUDA version and runtime CUDA version match. +You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/). -# Download data split -wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/test.txt -wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/train.txt -wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/val.txt -wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt +`E.g.1` If you have CUDA 10.1 installed under `/usr/local/cuda` and would like to install +PyTorch 1.5, you need to install the prebuilt PyTorch with CUDA 10.1. -python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti +```python +conda install pytorch cudatoolkit=10.1 torchvision -c pytorch ``` -Download Waymo open dataset V1.2 [HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put tfrecord files into corresponding folders in `data/waymo/waymo_format/` and put the data split txt files into `data/waymo/kitti_format/ImageSets`. Download ground truth bin file for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare waymo data by running +`E.g. 2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install +PyTorch 1.3.1., you need to install the prebuilt PyTorch with CUDA 9.2. -```bash -python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo +```python +conda install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch ``` -Note that if your local disk does not have enough space for saving converted data, you can change the `out-dir` to anywhere else. Just remember to create folders and prepare data there in advance and link them back to `data/waymo/kitti_format` after the data conversion. +If you build PyTorch from source instead of installing the prebuilt pacakge, +you can use more CUDA versions such as 9.0. + +c. Install [MMCV](https://mmcv.readthedocs.io/en/latest/). +*mmcv-full* is necessary since MMDetection3D relies on MMDetection, CUDA ops in *mmcv-full* are required. -Download nuScenes V1.0 full dataset data [HERE]( https://www.nuscenes.org/download). Prepare nuscenes data by running +The pre-build *mmcv-full* could be installed by running: (available versions could be found [here](https://mmcv.readthedocs.io/en/latest/#install-with-pip)) -```bash -python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes +```shell +pip install mmcv-full==latest+torch1.5.0+cu101 -f https://download.openmmlab.com/mmcv/dist/index.html ``` -Download Lyft 3D detection data [HERE](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data). Prepare Lyft data by running +Optionally, you could also build the full version from source: -```bash -python tools/create_data.py lyft --root-path ./data/lyft --out-dir ./data/lyft --extra-tag lyft --version v1.01 +```shell +pip install mmcv-full ``` -Note that we follow the original folder names for clear organization. Please rename the raw folders as shown above. +d. Install [MMDetection](https://github.com/open-mmlab/mmdetection). -To prepare scannet data, please see [scannet](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/README.md). +```shell +pip install git+https://github.com/open-mmlab/mmdetection.git +``` -To prepare sunrgbd data, please see [sunrgbd](https://github.com/open-mmlab/mmdetection3d/blob/master/data/sunrgbd/README.md). +Optionally, you could also build MMDetection from source in case you want to modify the code: -For using custom datasets, please refer to [Tutorials 2: Adding New Dataset](tutorials/new_dataset.md). +```shell +git clone https://github.com/open-mmlab/mmdetection.git +cd mmdetection +pip install -r requirements/build.txt +pip install -v -e . # or "python setup.py develop" +``` -## Inference with pretrained models +**Important**: -We provide testing scripts to evaluate a whole dataset (SUNRGBD, ScanNet, KITTI, etc.), -and also some high-level apis for easier integration to other projects. +1. The required versions of MMCV and MMDetection for different versions of MMDetection3D are as below. Please install the correct version of MMCV and MMDetection to avoid installation issues. -### Test a dataset +| MMDetection3D version | MMDetection version | MMCV version | +|:-------------------:|:-------------------:|:-------------------:| +| master | mmdet>=2.5.0 | mmcv-full>=1.1.5, <=1.3| +| 0.8.0 | mmdet>=2.5.0 | mmcv-full>=1.1.5, <=1.3| +| 0.7.0 | mmdet>=2.5.0 | mmcv-full>=1.1.5, <=1.3| +| 0.6.0 | mmdet>=2.4.0 | mmcv-full>=1.1.3, <=1.2| +| 0.5.0 | 2.3.0 | mmcv-full==1.0.5| -- single GPU -- single node multiple GPU -- multiple node -You can use the following commands to test a dataset. +e. Clone the MMDetection3D repository. ```shell -# single-gpu testing -python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show] - -# multi-gpu testing -./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] +git clone https://github.com/open-mmlab/mmdetection3d.git +cd mmdetection3d ``` -Optional arguments: -- `RESULT_FILE`: Filename of the output results in pickle format. If not specified, the results will not be saved to a file. -- `EVAL_METRICS`: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., `proposal_fast`, `proposal`, `bbox`, `segm` are available for COCO, `mAP`, `recall` for PASCAL VOC. Cityscapes could be evaluated by `cityscapes` as well as all COCO metrics. -- `--show`: If specified, detection results will be plotted in the silient mode. It is only applicable to single GPU testing and used for debugging and visualization. This should be used with `--show-dir`. -- `--show-dir`: If specified, detection results will be plotted on the `***_points.obj` and `***_pred.ply` files in the specified directory. It is only applicable to single GPU testing and used for debugging and visualization. You do NOT need a GUI available in your environment for using this option. +f.Install build requirements and then install MMDetection3D. -Examples: - -Assume that you have already downloaded the checkpoints to the directory `checkpoints/`. - -1. Test votenet on ScanNet and save the points and prediction visualization results. - - ```shell - python tools/test.py configs/votenet/votenet_8x8_scannet-3d-18class.py \ - checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \ - --show --show-dir ./data/scannet/show_results - ``` +```shell +pip install -v -e . # or "python setup.py develop" +``` -2. Test votenet on ScanNet, save the points, prediction, groundtruth visualization results, and evaluate the mAP. +Note: - ```shell - python tools/test.py configs/votenet/votenet_8x8_scannet-3d-18class.py \ - checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \ - --eval mAP - --options 'show=True' 'out_dir=./data/scannet/show_results' - ``` +1. The git commit id will be written to the version number with step d, e.g. 0.6.0+2e7045c. The version will also be saved in trained models. +It is recommended that you run step d each time you pull some updates from github. If C++/CUDA codes are modified, then this step is compulsory. -3. Test votenet on ScanNet (without saving the test results) and evaluate the mAP. + > Important: Be sure to remove the `./build` folder if you reinstall mmdet with a different CUDA/PyTorch version. - ```shell - python tools/test.py configs/votenet/votenet_8x8_scannet-3d-18class.py \ - checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \ - --eval mAP - ``` + ```shell + pip uninstall mmdet3d + rm -rf ./build + find . -name "*.so" | xargs rm + ``` -4. Test SECOND with 8 GPUs, and evaluate the mAP. +2. Following the above instructions, mmdetection is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it (unless you submit some commits and want to update the version number). - ```shell - ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/second/hv_second_secfpn_6x8_80e_kitti-3d-3class.py \ - checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth \ - --out results.pkl --eval mAP - ``` +3. If you would like to use `opencv-python-headless` instead of `opencv-python`, +you can install it before installing MMCV. -5. Test PointPillars on nuscenes with 8 GPUs, and generate the json file to be submit to the official evaluation server. +4. Some dependencies are optional. Simply running `pip install -v -e .` will only install the minimum runtime requirements. To use optional dependencies like `albumentations` and `imagecorruptions` either install them manually with `pip install -r requirements/optional.txt` or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`). Valid keys for the extras field are: `all`, `tests`, `build`, and `optional`. - ```shell - ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d.py \ - checkpoints/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d_20200620_230405-2fa62f3d.pth \ - --format-only --options 'jsonfile_prefix=./pointpillars_nuscenes_results' - ``` +5. The code can not be built for CPU only environment (where CUDA isn't available) for now. - The generated results be under `./pointpillars_nuscenes_results` directory. +## Another option: Docker Image -6. Test SECOND on KITTI with 8 GPUs, and generate the pkl files and submission datas to be submit to the official evaluation server. +We provide a [Dockerfile](https://github.com/open-mmlab/mmdetection3d/blob/master/docker/Dockerfile) to build an image. - ```shell - ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/second/hv_second_secfpn_6x8_80e_kitti-3d-3class.py \ - checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth \ - --format-only --options 'pklfile_prefix=./second_kitti_results' 'submission_prefix=./second_kitti_results' - ``` +```shell +# build an image with PyTorch 1.6, CUDA 10.1 +docker build -t mmdetection3d docker/ +``` - The generated results be under `./second_kitti_results` directory. +Run it with -7. Test PointPillars on Lyft with 8 GPUs, generate the pkl files and make a submission to the leaderboard. +```shell +docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmdetection3d/data mmdetection3d +``` - ```shell - ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_fpn_sbn-2x8_2x_lyft-3d.py \ - checkpoints/hv_pointpillars_fpn_sbn-2x8_2x_lyft-3d_latest.pth --out results/pp_lyft/results_challenge.pkl \ - --format-only --options 'jsonfile_prefix=results/pp_lyft/results_challenge' \ - 'csv_path=results/pp_lyft/results_challenge.csv' - ``` +## A from-scratch setup script - **Notice**: To generate submissions on Lyft, `csv_path` must be given in the options. After generating the csv file, you can make a submission with kaggle commands given on the [website](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/submit). +Here is a full script for setting up mmdetection with conda. -7. Test PointPillars on waymo with 8 GPUs, and evaluate the mAP with waymo metrics. +```shell +conda create -n open-mmlab python=3.7 -y +conda activate open-mmlab - ```shell - ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \ - checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \ - --eval waymo --options 'pklfile_prefix=results/waymo-car/kitti_results' \ - 'submission_prefix=results/waymo-car/kitti_results' - ``` +# install latest pytorch prebuilt with the default prebuilt CUDA version (usually the latest) +conda install -c pytorch pytorch torchvision -y - **Notice**: For evaluation on waymo, please follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md) to build the binary file `compute_detection_metrics_main` for metrics computation and put it into `mmdet3d/core/evaluation/waymo_utils/`.(Sometimes when using bazel to build `compute_detection_metrics_main`, an error `'round' is not a member of 'std'` may appear. We just need to remove the `std::` before `round` in that file.) `pklfile_prefix` should be given in the options for the bin file generation. For metrics, `waymo` is the recommended official evaluation prototype. Currently, evaluating with choice `kitti` is adapted from KITTI and the results for each difficulty are not exactly the same as the definition of KITTI. Instead, most of objects are marked with difficulty 0 currently, which will be fixed in the future. The reasons of its instability include the large computation for evalution, the lack of occlusion and truncation in the converted data, different definition of difficulty and different methods of computing average precision. +# install mmcv +pip install mmcv-full -8. Test PointPillars on waymo with 8 GPUs, generate the bin files and make a submission to the leaderboard. +# install mmdetection +pip install git+https://github.com/open-mmlab/mmdetection.git - ```shell - ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \ - checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \ - --format-only --options 'pklfile_prefix=results/waymo-car/kitti_results' \ - 'submission_prefix=results/waymo-car/kitti_results' - ``` +# install mmdetection3d +git clone https://github.com/open-mmlab/mmdetection3d.git +cd mmdetection3d +pip install -v -e . +``` - **Notice**: After generating the bin file, you can simply build the binary file `create_submission` and use them to create a submission file by following the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md). For evaluation on the validation set with the eval server, you can also use the same way to generate a submission. +## Using multiple MMDetection3D versions -### Visualization +The train and test scripts already modify the `PYTHONPATH` to ensure the script use the MMDetection3D in the current directory. -To see the SUNRGBD, ScanNet or KITTI points and detection results, you can run the following command +To use the default MMDetection3D installed in the environment rather than that you are working with, you can remove the following line in those scripts -```bash -python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --show --show-dir ${SHOW_DIR} +```shell +PYTHONPATH="$(dirname $0)/..":$PYTHONPATH ``` -Aftering running this command, plotted results ***_points.obj and ***_pred.ply files in `${SHOW_DIR}`. +# Verification -To see the points, detection results and ground truth of SUNRGBD, ScanNet or KITTI during evaluation time, you can run the following command -```bash -python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --eval 'mAP' --options 'show=True' 'out_dir=${SHOW_DIR}' -``` -After running this command, you will obtain ***_points.ob, ***_pred.ply files and ***_gt.ply in `${SHOW_DIR}`. +TBD -You can use 3D visualization software such as the [MeshLab](http://www.meshlab.net/) to open the these files under `${SHOW_DIR}` to see the 3D detection output. Specifically, open `***_points.obj` to see the input point cloud and open `***_pred.ply` to see the predicted 3D bounding boxes. This allows the inference and results generation be done in remote server and the users can open them on their host with GUI. +# Demo -**Notice**: The visualization API is a little unstable since we plan to refactor these parts together with MMDetection in the future. - -### Point cloud demo +## Point cloud demo We provide a demo script to test a single sample. @@ -286,9 +213,9 @@ Examples: convert_ply('./test.ply', './test.bin') ``` -### High-level APIs for testing point clouds +## High-level APIs for testing point clouds -#### Synchronous interface +### Synchronous interface Here is an example of building the model and test given point clouds. ```python @@ -308,180 +235,3 @@ model.show_results(data, result, out_dir='results') ``` A notebook demo can be found in [demo/inference_demo.ipynb](https://github.com/open-mmlab/mmdetection/blob/master/demo/inference_demo.ipynb). - -## Train a model - -MMDetection implements distributed training and non-distributed training, -which uses `MMDistributedDataParallel` and `MMDataParallel` respectively. - -All outputs (log files and checkpoints) will be saved to the working directory, -which is specified by `work_dir` in the config file. - -By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by adding the interval argument in the training config. -```python -evaluation = dict(interval=12) # This evaluate the model per 12 epoch. -``` - -**Important**: The default learning rate in config files is for 8 GPUs and the exact batch size is marked by the config's file name, e.g. '2x8' means 2 samples per GPU using 8 GPUs. -According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. However, since most of the models in this repo use ADAM rather than SGD for optimization, the rule may not hold and users need to tune the learning rate by themselves. - -### Train with a single GPU - -```shell -python tools/train.py ${CONFIG_FILE} [optional arguments] -``` - -If you want to specify the working directory in the command, you can add an argument `--work_dir ${YOUR_WORK_DIR}`. - -### Train with multiple GPUs - -```shell -./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments] -``` - -Optional arguments are: - -- `--no-validate` (**not suggested**): By default, the codebase will perform evaluation at every k (default value is 1, which can be modified like [this](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py#L174)) epochs during the training. To disable this behavior, use `--no-validate`. -- `--work-dir ${WORK_DIR}`: Override the working directory specified in the config file. -- `--resume-from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file. -- `--options 'Key=value'`: Overide some settings in the used config. - -Difference between `resume-from` and `load-from`: -`resume-from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally. -`load-from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning. - -### Train with multiple machines - -If you run MMDetection on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.) - -```shell -[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} -``` - -Here is an example of using 16 GPUs to train Mask R-CNN on the dev partition. - -```shell -GPUS=16 ./tools/slurm_train.sh dev mask_r50_1x configs/mask_rcnn_r50_fpn_1x_coco.py /nfs/xxxx/mask_rcnn_r50_fpn_1x -``` - -You can check [slurm_train.sh](https://github.com/open-mmlab/mmdetection/blob/master/tools/slurm_train.sh) for full arguments and environment variables. - -If you have just multiple machines connected with ethernet, you can refer to -PyTorch [launch utility](https://pytorch.org/docs/stable/distributed_deprecated.html#launch-utility). -Usually it is slow if you do not have high speed networking like InfiniBand. - -### Launch multiple jobs on a single machine - -If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, -you need to specify different ports (29500 by default) for each job to avoid communication conflict. - -If you use `dist_train.sh` to launch training jobs, you can set the port in commands. - -```shell -CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4 -CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4 -``` - -If you use launch training jobs with Slurm, there are two ways to specify the ports. - -1. Set the port through `--options`. This is more recommended since it does not change the original configs. - - ```shell - CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --options 'dist_params.port=29500' - CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --options 'dist_params.port=29501' - ``` - -2. Modify the config files (usually the 6th line from the bottom in config files) to set different communication ports. - - In `config1.py`, - - ```python - dist_params = dict(backend='nccl', port=29500) - ``` - - In `config2.py`, - - ```python - dist_params = dict(backend='nccl', port=29501) - ``` - - Then you can launch two jobs with `config1.py` ang `config2.py`. - - ```shell - CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} - CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} - ``` - -## Useful tools - -We provide lots of useful tools under `tools/` directory. - -### Analyze logs - -You can plot loss/mAP curves given a training log file. Run `pip install seaborn` first to install the dependency. - -![loss curve image](../resources/loss_curve.png) - -```shell -python tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}] -``` - -Examples: - -- Plot the classification loss of some run. - - ```shell - python tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls - ``` - -- Plot the classification and regression loss of some run, and save the figure to a pdf. - - ```shell - python tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf - ``` - -- Compare the bbox mAP of two runs in the same figure. - - ```shell - python tools/analyze_logs.py plot_curve log1.json log2.json --keys bbox_mAP --legend run1 run2 - ``` - -You can also compute the average training speed. - -```shell -python tools/analyze_logs.py cal_train_time log.json [--include-outliers] -``` - -The output is expected to be like the following. - -``` ------Analyze train time of work_dirs/some_exp/20190611_192040.log.json----- -slowest epoch 11, average time is 1.2024 -fastest epoch 1, average time is 1.1909 -time std over epochs is 0.0028 -average iter time: 1.1959 s/iter - -``` - -### Publish a model - -Before you upload a model to AWS, you may want to -(1) convert model weights to CPU tensors, (2) delete the optimizer states and -(3) compute the hash of the checkpoint file and append the hash id to the filename. - -```shell -python tools/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME} -``` - -E.g., - -```shell -python tools/publish_model.py work_dirs/faster_rcnn/latest.pth faster_rcnn_r50_fpn_1x_20190801.pth -``` - -The final output filename will be `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth`. - -## Tutorials - -Currently, we provide four tutorials for users to [finetune models](tutorials/finetune.md), [add new dataset](tutorials/new_dataset.md), [design data pipeline](tutorials/data_pipeline.md) and [add new modules](tutorials/new_modules.md). -We also provide a full description about the [config system](config.md). From a401b96c30c29451023f76379563983a80d44cf1 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Fri, 18 Dec 2020 20:25:40 +0800 Subject: [PATCH 02/43] Create data_preparation.md --- docs/data_preparation.md | 108 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 108 insertions(+) create mode 100644 docs/data_preparation.md diff --git a/docs/data_preparation.md b/docs/data_preparation.md new file mode 100644 index 0000000000..d47ea0e77e --- /dev/null +++ b/docs/data_preparation.md @@ -0,0 +1,108 @@ +# Prepare datasets + +It is recommended to symlink the dataset root to `$MMDETECTION3D/data`. +If your folder structure is different from the following, you may need to change the corresponding paths in config files. + +``` +mmdetection3d +├── mmdet3d +├── tools +├── configs +├── data +│ ├── nuscenes +│ │ ├── maps +│ │ ├── samples +│ │ ├── sweeps +│ │ ├── v1.0-test +| | ├── v1.0-trainval +│ ├── kitti +│ │ ├── ImageSets +│ │ ├── testing +│ │ │ ├── calib +│ │ │ ├── image_2 +│ │ │ ├── velodyne +│ │ ├── training +│ │ │ ├── calib +│ │ │ ├── image_2 +│ │ │ ├── label_2 +│ │ │ ├── velodyne +│ ├── waymo +│ │ ├── waymo_format +│ │ │ ├── training +│ │ │ ├── validation +│ │ │ ├── testing +│ │ │ ├── gt.bin +│ │ ├── kitti_format +│ │ │ ├── ImageSets +│ ├── lyft +│ │ ├── v1.01-train +│ │ │ ├── v1.01-train (train_data) +│ │ │ ├── lidar (train_lidar) +│ │ │ ├── images (train_images) +│ │ │ ├── maps (train_maps) +│ │ ├── v1.01-test +│ │ │ ├── v1.01-test (test_data) +│ │ │ ├── lidar (test_lidar) +│ │ │ ├── images (test_images) +│ │ │ ├── maps (test_maps) +│ │ ├── train.txt +│ │ ├── val.txt +│ │ ├── test.txt +│ │ ├── sample_submission.csv +│ ├── scannet +│ │ ├── meta_data +│ │ ├── scans +│ │ ├── batch_load_scannet_data.py +│ │ ├── load_scannet_data.py +│ │ ├── scannet_utils.py +│ │ ├── README.md +│ ├── sunrgbd +│ │ ├── OFFICIAL_SUNRGBD +│ │ ├── matlab +│ │ ├── sunrgbd_data.py +│ │ ├── sunrgbd_utils.py +│ │ ├── README.md + +``` + +Download KITTI 3D detection data [HERE](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). Prepare kitti data by running + +```bash +mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets + +# Download data split +wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/test.txt +wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/train.txt +wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/val.txt +wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt + +python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti +``` + +Download Waymo open dataset V1.2 [HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put tfrecord files into corresponding folders in `data/waymo/waymo_format/` and put the data split txt files into `data/waymo/kitti_format/ImageSets`. Download ground truth bin file for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare waymo data by running + +```bash +python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo +``` + +Note that if your local disk does not have enough space for saving converted data, you can change the `out-dir` to anywhere else. Just remember to create folders and prepare data there in advance and link them back to `data/waymo/kitti_format` after the data conversion. + +Download nuScenes V1.0 full dataset data [HERE]( https://www.nuscenes.org/download). Prepare nuscenes data by running + +```bash +python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes +``` + +Download Lyft 3D detection data [HERE](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data). Prepare Lyft data by running + +```bash +python tools/create_data.py lyft --root-path ./data/lyft --out-dir ./data/lyft --extra-tag lyft --version v1.01 +``` + +Note that we follow the original folder names for clear organization. Please rename the raw folders as shown above. + +To prepare scannet data, please see [scannet](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/README.md). + +To prepare sunrgbd data, please see [sunrgbd](https://github.com/open-mmlab/mmdetection3d/blob/master/data/sunrgbd/README.md). + +For using custom datasets, please refer to [Tutorials 2: Customize Datasets](tutorials/new_dataset.md). From 31644eaee0b99d0e33b41bd1ddf37a195a1208b4 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Fri, 18 Dec 2020 20:38:13 +0800 Subject: [PATCH 03/43] Refactor model_zoo.md --- docs/model_zoo.md | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/docs/model_zoo.md b/docs/model_zoo.md index 603538638b..a8f65ba2d6 100644 --- a/docs/model_zoo.md +++ b/docs/model_zoo.md @@ -1,55 +1,53 @@ -# Model Zoo - -## Common settings +# Common settings - We use distributed training. - For fair comparison with other codebases, we report the GPU memory as the maximum value of `torch.cuda.max_memory_allocated()` for all 8 GPUs. Note that this value is usually less than what `nvidia-smi` shows. - We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/benchmark.py) which computes the average time on 2000 images. -## Baselines +# Baselines -### SECOND +## SECOND Please refer to [SECOND](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/second) for details. We provide SECOND baselines on KITTI and Waymo datasets. -### PointPillars +## PointPillars Please refer to [PointPillars](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars) for details. We provide pointpillars baselines on KITTI, nuScenes, Lyft, and Waymo datasets. -### Part-A2 +## Part-A2 Please refer to [Part-A2](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/parta2) for details. -### VoteNet +## VoteNet Please refer to [VoteNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/votenet) for details. We provide VoteNet baselines on ScanNet and SUNRGBD datasets. -### Dynamic Voxelization +## Dynamic Voxelization Please refer to [Dynamic Voxelization](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/dynamic_voxelization) for details. -### MVXNet +## MVXNet Please refer to [MVXNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/mvxnet) for details. -### RegNetX +## RegNetX Please refer to [RegNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/regnet) for details. We provide pointpillars baselines with RegNetX backbones on nuScenes and Lyft datasets currently. -### nuImages +## nuImages We also support baseline models on [nuImages dataset](https://www.nuscenes.org/nuimages). Please refer to [nuImages](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/nuimages) for details. We report Mask R-CNN, Cascade Mask R-CNN and HTC results currently. -### H3DNet +## H3DNet Please refer to [H3DNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/h3dnet) for details. -### 3DSSD +## 3DSSD Please refer to [3DSSD](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/3dssd) for details. -### CenterPoint +## CenterPoint Please refer to [CenterPoint](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/centerpoint) for details. -### SSN +## SSN Please refer to [SSN](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/ssn) for details. We provide pointpillars with shape-aware grouping heads used in SSN on the nuScenes and Lyft dataset currently. From afd03b59941e426c9799d7f87e54a398cf1160e1 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Fri, 18 Dec 2020 20:39:06 +0800 Subject: [PATCH 04/43] Remove the title --- docs/data_preparation.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/data_preparation.md b/docs/data_preparation.md index d47ea0e77e..eaa982099c 100644 --- a/docs/data_preparation.md +++ b/docs/data_preparation.md @@ -1,5 +1,3 @@ -# Prepare datasets - It is recommended to symlink the dataset root to `$MMDETECTION3D/data`. If your folder structure is different from the following, you may need to change the corresponding paths in config files. From fe12cd7cf7b8a36f5218b15cfeab692552340ae8 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Fri, 18 Dec 2020 20:47:24 +0800 Subject: [PATCH 05/43] Put Data Preparation under Get Started --- docs/data_preparation.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/data_preparation.md b/docs/data_preparation.md index eaa982099c..1e1aa303b3 100644 --- a/docs/data_preparation.md +++ b/docs/data_preparation.md @@ -1,3 +1,5 @@ +# Data Preparation + It is recommended to symlink the dataset root to `$MMDETECTION3D/data`. If your folder structure is different from the following, you may need to change the corresponding paths in config files. From d473faece416ddf49ad02129ac05b46ffd667559 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Fri, 18 Dec 2020 20:56:41 +0800 Subject: [PATCH 06/43] Adjust the level of Model Zoo --- docs/model_zoo.md | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/docs/model_zoo.md b/docs/model_zoo.md index a8f65ba2d6..603538638b 100644 --- a/docs/model_zoo.md +++ b/docs/model_zoo.md @@ -1,53 +1,55 @@ -# Common settings +# Model Zoo + +## Common settings - We use distributed training. - For fair comparison with other codebases, we report the GPU memory as the maximum value of `torch.cuda.max_memory_allocated()` for all 8 GPUs. Note that this value is usually less than what `nvidia-smi` shows. - We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/benchmark.py) which computes the average time on 2000 images. -# Baselines +## Baselines -## SECOND +### SECOND Please refer to [SECOND](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/second) for details. We provide SECOND baselines on KITTI and Waymo datasets. -## PointPillars +### PointPillars Please refer to [PointPillars](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars) for details. We provide pointpillars baselines on KITTI, nuScenes, Lyft, and Waymo datasets. -## Part-A2 +### Part-A2 Please refer to [Part-A2](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/parta2) for details. -## VoteNet +### VoteNet Please refer to [VoteNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/votenet) for details. We provide VoteNet baselines on ScanNet and SUNRGBD datasets. -## Dynamic Voxelization +### Dynamic Voxelization Please refer to [Dynamic Voxelization](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/dynamic_voxelization) for details. -## MVXNet +### MVXNet Please refer to [MVXNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/mvxnet) for details. -## RegNetX +### RegNetX Please refer to [RegNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/regnet) for details. We provide pointpillars baselines with RegNetX backbones on nuScenes and Lyft datasets currently. -## nuImages +### nuImages We also support baseline models on [nuImages dataset](https://www.nuscenes.org/nuimages). Please refer to [nuImages](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/nuimages) for details. We report Mask R-CNN, Cascade Mask R-CNN and HTC results currently. -## H3DNet +### H3DNet Please refer to [H3DNet](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/h3dnet) for details. -## 3DSSD +### 3DSSD Please refer to [3DSSD](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/3dssd) for details. -## CenterPoint +### CenterPoint Please refer to [CenterPoint](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/centerpoint) for details. -## SSN +### SSN Please refer to [SSN](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/ssn) for details. We provide pointpillars with shape-aware grouping heads used in SSN on the nuScenes and Lyft dataset currently. From 34b4678e4d99e9629518e17db85ee2690481d7cd Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Fri, 18 Dec 2020 21:20:22 +0800 Subject: [PATCH 07/43] Create 1_exist_data_model.md --- docs/1_exist_data_model.md | 222 +++++++++++++++++++++++++++++++++++++ 1 file changed, 222 insertions(+) create mode 100644 docs/1_exist_data_model.md diff --git a/docs/1_exist_data_model.md b/docs/1_exist_data_model.md new file mode 100644 index 0000000000..2a01b5358f --- /dev/null +++ b/docs/1_exist_data_model.md @@ -0,0 +1,222 @@ +# 1. Inference and train with existing models and standard datasets + +## Inference with existing models + +Here we provide testing scripts to evaluate a whole dataset (SUNRGBD, ScanNet, KITTI, etc.). + +For high-level apis easier to integrated into other projects and basic demos, please refer to Demo under [Get Started](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/getting_started.md). + +### Test existing models on standard datasets + +- single GPU +- single node multiple GPU +- multiple node + +You can use the following commands to test a dataset. + +```shell +# single-gpu testing +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show] + +# multi-gpu testing +./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] +``` + +Optional arguments: +- `RESULT_FILE`: Filename of the output results in pickle format. If not specified, the results will not be saved to a file. +- `EVAL_METRICS`: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., `proposal_fast`, `proposal`, `bbox`, `segm` are available for COCO, `mAP`, `recall` for PASCAL VOC. Cityscapes could be evaluated by `cityscapes` as well as all COCO metrics. +- `--show`: If specified, detection results will be plotted in the silient mode. It is only applicable to single GPU testing and used for debugging and visualization. This should be used with `--show-dir`. +- `--show-dir`: If specified, detection results will be plotted on the `***_points.obj` and `***_pred.ply` files in the specified directory. It is only applicable to single GPU testing and used for debugging and visualization. You do NOT need a GUI available in your environment for using this option. + +Examples: + +Assume that you have already downloaded the checkpoints to the directory `checkpoints/`. + +1. Test votenet on ScanNet and save the points and prediction visualization results. + + ```shell + python tools/test.py configs/votenet/votenet_8x8_scannet-3d-18class.py \ + checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \ + --show --show-dir ./data/scannet/show_results + ``` + +2. Test votenet on ScanNet, save the points, prediction, groundtruth visualization results, and evaluate the mAP. + + ```shell + python tools/test.py configs/votenet/votenet_8x8_scannet-3d-18class.py \ + checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \ + --eval mAP + --options 'show=True' 'out_dir=./data/scannet/show_results' + ``` + +3. Test votenet on ScanNet (without saving the test results) and evaluate the mAP. + + ```shell + python tools/test.py configs/votenet/votenet_8x8_scannet-3d-18class.py \ + checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \ + --eval mAP + ``` + +4. Test SECOND with 8 GPUs, and evaluate the mAP. + + ```shell + ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/second/hv_second_secfpn_6x8_80e_kitti-3d-3class.py \ + checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth \ + --out results.pkl --eval mAP + ``` + +5. Test PointPillars on nuscenes with 8 GPUs, and generate the json file to be submit to the official evaluation server. + + ```shell + ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d.py \ + checkpoints/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d_20200620_230405-2fa62f3d.pth \ + --format-only --options 'jsonfile_prefix=./pointpillars_nuscenes_results' + ``` + + The generated results be under `./pointpillars_nuscenes_results` directory. + +6. Test SECOND on KITTI with 8 GPUs, and generate the pkl files and submission datas to be submit to the official evaluation server. + + ```shell + ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/second/hv_second_secfpn_6x8_80e_kitti-3d-3class.py \ + checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth \ + --format-only --options 'pklfile_prefix=./second_kitti_results' 'submission_prefix=./second_kitti_results' + ``` + + The generated results be under `./second_kitti_results` directory. + +7. Test PointPillars on Lyft with 8 GPUs, generate the pkl files and make a submission to the leaderboard. + + ```shell + ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_fpn_sbn-2x8_2x_lyft-3d.py \ + checkpoints/hv_pointpillars_fpn_sbn-2x8_2x_lyft-3d_latest.pth --out results/pp_lyft/results_challenge.pkl \ + --format-only --options 'jsonfile_prefix=results/pp_lyft/results_challenge' \ + 'csv_path=results/pp_lyft/results_challenge.csv' + ``` + + **Notice**: To generate submissions on Lyft, `csv_path` must be given in the options. After generating the csv file, you can make a submission with kaggle commands given on the [website](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/submit). + +7. Test PointPillars on waymo with 8 GPUs, and evaluate the mAP with waymo metrics. + + ```shell + ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \ + checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \ + --eval waymo --options 'pklfile_prefix=results/waymo-car/kitti_results' \ + 'submission_prefix=results/waymo-car/kitti_results' + ``` + + **Notice**: For evaluation on waymo, please follow the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md) to build the binary file `compute_detection_metrics_main` for metrics computation and put it into `mmdet3d/core/evaluation/waymo_utils/`.(Sometimes when using bazel to build `compute_detection_metrics_main`, an error `'round' is not a member of 'std'` may appear. We just need to remove the `std::` before `round` in that file.) `pklfile_prefix` should be given in the options for the bin file generation. For metrics, `waymo` is the recommended official evaluation prototype. Currently, evaluating with choice `kitti` is adapted from KITTI and the results for each difficulty are not exactly the same as the definition of KITTI. Instead, most of objects are marked with difficulty 0 currently, which will be fixed in the future. The reasons of its instability include the large computation for evalution, the lack of occlusion and truncation in the converted data, different definition of difficulty and different methods of computing average precision. + +8. Test PointPillars on waymo with 8 GPUs, generate the bin files and make a submission to the leaderboard. + + ```shell + ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car.py \ + checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth --out results/waymo-car/results_eval.pkl \ + --format-only --options 'pklfile_prefix=results/waymo-car/kitti_results' \ + 'submission_prefix=results/waymo-car/kitti_results' + ``` + + **Notice**: After generating the bin file, you can simply build the binary file `create_submission` and use them to create a submission file by following the [instruction](https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md). For evaluation on the validation set with the eval server, you can also use the same way to generate a submission. + +## Train predefined models on standard datasets + +MMDetection implements distributed training and non-distributed training, +which uses `MMDistributedDataParallel` and `MMDataParallel` respectively. + +All outputs (log files and checkpoints) will be saved to the working directory, +which is specified by `work_dir` in the config file. + +By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by adding the interval argument in the training config. +```python +evaluation = dict(interval=12) # This evaluate the model per 12 epoch. +``` + +**Important**: The default learning rate in config files is for 8 GPUs and the exact batch size is marked by the config's file name, e.g. '2x8' means 2 samples per GPU using 8 GPUs. +According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. However, since most of the models in this repo use ADAM rather than SGD for optimization, the rule may not hold and users need to tune the learning rate by themselves. + +### Train with a single GPU + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +If you want to specify the working directory in the command, you can add an argument `--work_dir ${YOUR_WORK_DIR}`. + +### Train with multiple GPUs + +```shell +./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments] +``` + +Optional arguments are: + +- `--no-validate` (**not suggested**): By default, the codebase will perform evaluation at every k (default value is 1, which can be modified like [this](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py#L174)) epochs during the training. To disable this behavior, use `--no-validate`. +- `--work-dir ${WORK_DIR}`: Override the working directory specified in the config file. +- `--resume-from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file. +- `--options 'Key=value'`: Overide some settings in the used config. + +Difference between `resume-from` and `load-from`: +`resume-from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally. +`load-from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning. + +### Train with multiple machines + +If you run MMDetection on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.) + +```shell +[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} +``` + +Here is an example of using 16 GPUs to train Mask R-CNN on the dev partition. + +```shell +GPUS=16 ./tools/slurm_train.sh dev mask_r50_1x configs/mask_rcnn_r50_fpn_1x_coco.py /nfs/xxxx/mask_rcnn_r50_fpn_1x +``` + +You can check [slurm_train.sh](https://github.com/open-mmlab/mmdetection/blob/master/tools/slurm_train.sh) for full arguments and environment variables. + +If you have just multiple machines connected with ethernet, you can refer to +PyTorch [launch utility](https://pytorch.org/docs/stable/distributed_deprecated.html#launch-utility). +Usually it is slow if you do not have high speed networking like InfiniBand. + +### Launch multiple jobs on a single machine + +If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, +you need to specify different ports (29500 by default) for each job to avoid communication conflict. + +If you use `dist_train.sh` to launch training jobs, you can set the port in commands. + +```shell +CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4 +CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4 +``` + +If you use launch training jobs with Slurm, there are two ways to specify the ports. + +1. Set the port through `--options`. This is more recommended since it does not change the original configs. + + ```shell + CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --options 'dist_params.port=29500' + CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --options 'dist_params.port=29501' + ``` + +2. Modify the config files (usually the 6th line from the bottom in config files) to set different communication ports. + + In `config1.py`, + + ```python + dist_params = dict(backend='nccl', port=29500) + ``` + + In `config2.py`, + + ```python + dist_params = dict(backend='nccl', port=29501) + ``` + + Then you can launch two jobs with `config1.py` ang `config2.py`. + + ```shell + CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} + CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} + ``` From 9d879db4ffcb5cabdda27a2ebec992b7b42dec54 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Fri, 18 Dec 2020 21:21:39 +0800 Subject: [PATCH 08/43] Create 2_new_data_model.md --- docs/2_new_data_model.md | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 docs/2_new_data_model.md diff --git a/docs/2_new_data_model.md b/docs/2_new_data_model.md new file mode 100644 index 0000000000..62621b3fa6 --- /dev/null +++ b/docs/2_new_data_model.md @@ -0,0 +1,9 @@ +# 2: Train with customized datasets + +## Prepare the customized dataset + +## Prepare a config + +## Train a new model + +## Test and inference From b180448dae0a3feb3fecbbf071f3d87f5ace7e91 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Fri, 18 Dec 2020 21:21:52 +0800 Subject: [PATCH 09/43] Update 1_exist_data_model.md --- docs/1_exist_data_model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/1_exist_data_model.md b/docs/1_exist_data_model.md index 2a01b5358f..a0752c3619 100644 --- a/docs/1_exist_data_model.md +++ b/docs/1_exist_data_model.md @@ -1,4 +1,4 @@ -# 1. Inference and train with existing models and standard datasets +# 1: Inference and train with existing models and standard datasets ## Inference with existing models From f08e136f0d22f5c50ce2ff88ac139b628f6d1cb6 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Fri, 18 Dec 2020 21:32:39 +0800 Subject: [PATCH 10/43] Create useful_tools.md --- docs/useful_tools.md | 154 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 154 insertions(+) create mode 100644 docs/useful_tools.md diff --git a/docs/useful_tools.md b/docs/useful_tools.md new file mode 100644 index 0000000000..663f305fa9 --- /dev/null +++ b/docs/useful_tools.md @@ -0,0 +1,154 @@ +# Useful Tools and Scripts + +We provide lots of useful tools under `tools/` directory. + +## Log Analysis + +You can plot loss/mAP curves given a training log file. Run `pip install seaborn` first to install the dependency. + +![loss curve image](../resources/loss_curve.png) + +```shell +python tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}] +``` + +Examples: + +- Plot the classification loss of some run. + + ```shell + python tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls + ``` + +- Plot the classification and regression loss of some run, and save the figure to a pdf. + + ```shell + python tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf + ``` + +- Compare the bbox mAP of two runs in the same figure. + + ```shell + python tools/analyze_logs.py plot_curve log1.json log2.json --keys bbox_mAP --legend run1 run2 + ``` + +You can also compute the average training speed. + +```shell +python tools/analyze_logs.py cal_train_time log.json [--include-outliers] +``` + +The output is expected to be like the following. + +``` +-----Analyze train time of work_dirs/some_exp/20190611_192040.log.json----- +slowest epoch 11, average time is 1.2024 +fastest epoch 1, average time is 1.1909 +time std over epochs is 0.0028 +average iter time: 1.1959 s/iter +``` + +## Visualization + +To see the SUNRGBD, ScanNet or KITTI points and detection results, you can run the following command + +```bash +python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --show --show-dir ${SHOW_DIR} +``` + +Aftering running this command, plotted results ***_points.obj and ***_pred.ply files in `${SHOW_DIR}`. + +To see the points, detection results and ground truth of SUNRGBD, ScanNet or KITTI during evaluation time, you can run the following command +```bash +python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --eval 'mAP' --options 'show=True' 'out_dir=${SHOW_DIR}' +``` +After running this command, you will obtain ***_points.ob, ***_pred.ply files and ***_gt.ply in `${SHOW_DIR}`. + +You can use 3D visualization software such as the [MeshLab](http://www.meshlab.net/) to open the these files under `${SHOW_DIR}` to see the 3D detection output. Specifically, open `***_points.obj` to see the input point cloud and open `***_pred.ply` to see the predicted 3D bounding boxes. This allows the inference and results generation be done in remote server and the users can open them on their host with GUI. + +**Notice**: The visualization API is a little unstable since we plan to refactor these parts together with MMDetection in the future. + +## Model Complexity + +`tools/get_flops.py` is a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) to compute the FLOPs and params of a given model. + +```shell +python tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}] +``` + +You will get the results like this. + +```text +============================== +Input shape: (3, 1280, 800) +Flops: 239.32 GFLOPs +Params: 37.74 M +============================== +``` + +**Note**: This tool is still experimental and we do not guarantee that the + number is absolutely correct. You may well use the result for simple + comparisons, but double check it before you adopt it in technical reports or papers. + +1. FLOPs are related to the input shape while parameters are not. The default + input shape is (1, 3, 1280, 800). +2. Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details. +3. The FLOPs of two-stage detectors is dependent on the number of proposals. + +## Model Conversion + +### RegNet model to MMDetection + +`tools/regnet2mmdet.py` convert keys in pycls pretrained RegNet models to + MMDetection style. + +```shell +python tools/regnet2mmdet.py ${SRC} ${DST} [-h] +``` + +### Detectron ResNet to Pytorch + +`tools/detectron2pytorch.py` converts keys in the original detectron pretrained + ResNet models to PyTorch style. + +```shell +python tools/detectron2pytorch.py ${SRC} ${DST} ${DEPTH} [-h] +``` + +### Prepare a model for publishing + +`tools/publish_model.py` helps users to prepare their model for publishing. + +Before you upload a model to AWS, you may want to + +1. convert model weights to CPU tensors +2. delete the optimizer states and +3. compute the hash of the checkpoint file and append the hash id to the + filename. + +```shell +python tools/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME} +``` + +E.g., + +```shell +python tools/publish_model.py work_dirs/faster_rcnn/latest.pth faster_rcnn_r50_fpn_1x_20190801.pth +``` + +The final output filename will be `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth`. + +## Dataset Conversion + +TBD + +## Miscellaneous + +### Print the entire config + +`tools/print_config.py` prints the whole config verbatim, expanding all its + imports. + +```shell +python tools/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}] +``` From 5db81aaac9ae4febe511da5abd8dcc75a5e0a16c Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Fri, 18 Dec 2020 21:32:47 +0800 Subject: [PATCH 11/43] Update index.rst --- docs/index.rst | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index 03642c8b23..f4aebd3511 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -3,17 +3,18 @@ Welcome to MMDetection3D's documentation! .. toctree:: :maxdepth: 2 + :caption: Get Started - install.md getting_started.md + data_preparation.md model_zoo.md - + .. toctree:: :maxdepth: 2 - :caption: Notes + :caption: Quick Run - benchmarks.md - config.md + 1_exist_data_model.md + 2_new_data_model.md .. toctree:: :maxdepth: 2 @@ -21,6 +22,18 @@ Welcome to MMDetection3D's documentation! tutorials/index.rst +.. toctree:: + :maxdepth: 2 + :caption: Useful Tools and Scripts + + useful_tools.md + +.. toctree:: + :maxdepth: 2 + :caption: Notes + + benchmarks.md + .. toctree:: :caption: API Reference From c5f00231dde36eaa333c880a9c177c3a929139ca Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Fri, 18 Dec 2020 21:37:41 +0800 Subject: [PATCH 12/43] Delete install.md --- docs/install.md | 172 ------------------------------------------------ 1 file changed, 172 deletions(-) delete mode 100644 docs/install.md diff --git a/docs/install.md b/docs/install.md deleted file mode 100644 index e40dd15432..0000000000 --- a/docs/install.md +++ /dev/null @@ -1,172 +0,0 @@ -## Installation - -### Requirements - -- Linux or macOS (Windows is not currently officially supported) -- Python 3.6+ -- PyTorch 1.3+ -- CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible) -- GCC 5+ -- [mmcv](https://github.com/open-mmlab/mmcv) - - -### Install mmdetection - -a. Create a conda virtual environment and activate it. - -```shell -conda create -n open-mmlab python=3.7 -y -conda activate open-mmlab -``` - -b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/), e.g., - -```shell -conda install pytorch torchvision -c pytorch -``` - -Note: Make sure that your compilation CUDA version and runtime CUDA version match. -You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/). - -`E.g.1` If you have CUDA 10.1 installed under `/usr/local/cuda` and would like to install -PyTorch 1.5, you need to install the prebuilt PyTorch with CUDA 10.1. - -```python -conda install pytorch cudatoolkit=10.1 torchvision -c pytorch -``` - -`E.g. 2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install -PyTorch 1.3.1., you need to install the prebuilt PyTorch with CUDA 9.2. - -```python -conda install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch -``` - -If you build PyTorch from source instead of installing the prebuilt pacakge, -you can use more CUDA versions such as 9.0. - -c. Install [MMCV](https://mmcv.readthedocs.io/en/latest/). -*mmcv-full* is necessary since MMDetection3D relies on MMDetection, CUDA ops in *mmcv-full* are required. - -The pre-build *mmcv-full* could be installed by running: (available versions could be found [here](https://mmcv.readthedocs.io/en/latest/#install-with-pip)) - -```shell -pip install mmcv-full==latest+torch1.5.0+cu101 -f https://download.openmmlab.com/mmcv/dist/index.html -``` - -Optionally, you could also build the full version from source: - -```shell -pip install mmcv-full -``` - -d. Install [MMDetection](https://github.com/open-mmlab/mmdetection). - -```shell -pip install git+https://github.com/open-mmlab/mmdetection.git -``` - -Optionally, you could also build MMDetection from source in case you want to modify the code: - -```shell -git clone https://github.com/open-mmlab/mmdetection.git -cd mmdetection -pip install -r requirements/build.txt -pip install -v -e . # or "python setup.py develop" -``` - -**Important**: - -1. The required versions of MMCV and MMDetection for different versions of MMDetection3D are as below. Please install the correct version of MMCV and MMDetection to avoid installation issues. - -| MMDetection3D version | MMDetection version | MMCV version | -|:-------------------:|:-------------------:|:-------------------:| -| master | mmdet>=2.5.0 | mmcv-full>=1.1.5, <=1.3| -| 0.8.0 | mmdet>=2.5.0 | mmcv-full>=1.1.5, <=1.3| -| 0.7.0 | mmdet>=2.5.0 | mmcv-full>=1.1.5, <=1.3| -| 0.6.0 | mmdet>=2.4.0 | mmcv-full>=1.1.3, <=1.2| -| 0.5.0 | 2.3.0 | mmcv-full==1.0.5| - - -e. Clone the MMDetection3D repository. - -```shell -git clone https://github.com/open-mmlab/mmdetection3d.git -cd mmdetection3d -``` - -f.Install build requirements and then install MMDetection3D. - -```shell -pip install -v -e . # or "python setup.py develop" -``` - -Note: - -1. The git commit id will be written to the version number with step d, e.g. 0.6.0+2e7045c. The version will also be saved in trained models. -It is recommended that you run step d each time you pull some updates from github. If C++/CUDA codes are modified, then this step is compulsory. - - > Important: Be sure to remove the `./build` folder if you reinstall mmdet with a different CUDA/PyTorch version. - - ```shell - pip uninstall mmdet3d - rm -rf ./build - find . -name "*.so" | xargs rm - ``` - -2. Following the above instructions, mmdetection is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it (unless you submit some commits and want to update the version number). - -3. If you would like to use `opencv-python-headless` instead of `opencv-python`, -you can install it before installing MMCV. - -4. Some dependencies are optional. Simply running `pip install -v -e .` will only install the minimum runtime requirements. To use optional dependencies like `albumentations` and `imagecorruptions` either install them manually with `pip install -r requirements/optional.txt` or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`). Valid keys for the extras field are: `all`, `tests`, `build`, and `optional`. - -5. The code can not be built for CPU only environment (where CUDA isn't available) for now. - -### Another option: Docker Image - -We provide a [Dockerfile](https://github.com/open-mmlab/mmdetection3d/blob/master/docker/Dockerfile) to build an image. - -```shell -# build an image with PyTorch 1.6, CUDA 10.1 -docker build -t mmdetection3d docker/ -``` - -Run it with - -```shell -docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmdetection3d/data mmdetection3d -``` - -### A from-scratch setup script - -Here is a full script for setting up mmdetection with conda. - -```shell -conda create -n open-mmlab python=3.7 -y -conda activate open-mmlab - -# install latest pytorch prebuilt with the default prebuilt CUDA version (usually the latest) -conda install -c pytorch pytorch torchvision -y - -# install mmcv -pip install mmcv-full - -# install mmdetection -pip install git+https://github.com/open-mmlab/mmdetection.git - -# install mmdetection3d -git clone https://github.com/open-mmlab/mmdetection3d.git -cd mmdetection3d -pip install -v -e . -``` - -### Using multiple MMDetection3D versions - -The train and test scripts already modify the `PYTHONPATH` to ensure the script use the MMDetection3D in the current directory. - -To use the default MMDetection3D installed in the environment rather than that you are working with, you can remove the following line in those scripts - -```shell -PYTHONPATH="$(dirname $0)/..":$PYTHONPATH -``` From 485a249ec89cdc0b0c44e9f2dfe87e5f3c0fa500 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 16:14:15 +0800 Subject: [PATCH 13/43] Adjust the order of get started subsections --- docs/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.rst b/docs/index.rst index f4aebd3511..6344f4110d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -6,8 +6,8 @@ Welcome to MMDetection3D's documentation! :caption: Get Started getting_started.md - data_preparation.md model_zoo.md + data_preparation.md .. toctree:: :maxdepth: 2 From e6c799817785e89c1324de8b84ee0919cb599abe Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 19:57:55 +0800 Subject: [PATCH 14/43] First complete version for customized datasets --- docs/2_new_data_model.md | 93 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) diff --git a/docs/2_new_data_model.md b/docs/2_new_data_model.md index 62621b3fa6..8a2d6facd7 100644 --- a/docs/2_new_data_model.md +++ b/docs/2_new_data_model.md @@ -1,9 +1,102 @@ # 2: Train with customized datasets +In this note, you will know how to train and test predefined models with customized datasets. We use the Waymo dataset as an example to describe the whole process. + +The basic steps are as below: + +1. Prepare the customized dataset +2. Prepare a config +3. Train, test, inference models on the customized dataset. + ## Prepare the customized dataset +There are three ways to support a new dataset in MMDetection3D: + +1. reorganize the dataset into existed format. +2. reorganize the dataset into a middle format. +3. implement a new dataset. + +Usually we recommend to use the first two methods which are usually easier than the third. + +In this note, we give an example for converting the data into KITTI format. + +**Note**: We take Waymo as the example here considering its format is totally different from other existed formats. For other datasets using similar methods to organize data, like Lyft compared to nuScenes, it would be easier to directly implement the new dataset inherited from an existed one. + +### KITTI dataset format + +Firstly, the raw data for 3D object detection from KITTI are typically organized as follows, where `ImageSets` contains split files indicating which files belong to training/validation/testing set, `calib` contains calibration information files, `image_2` and `velodyne` include image data and point cloud data, and `label_2` includes label files for 3D detection. + +``` +mmdetection3d +├── mmdet3d +├── tools +├── configs +├── data +│ ├── kitti +│ │ ├── ImageSets +│ │ ├── testing +│ │ │ ├── calib +│ │ │ ├── image_2 +│ │ │ ├── velodyne +│ │ ├── training +│ │ │ ├── calib +│ │ │ ├── image_2 +│ │ │ ├── label_2 +│ │ │ ├── velodyne +``` + +Specific annotation format is described in the official object development [kit](https://s3.eu-central-1.amazonaws.com/avg-kitti/devkit_object.zip). For example, it consists of the following labels: + +``` +#Values Name Description +---------------------------------------------------------------------------- + 1 type Describes the type of object: 'Car', 'Van', 'Truck', + 'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram', + 'Misc' or 'DontCare' + 1 truncated Float from 0 (non-truncated) to 1 (truncated), where + truncated refers to the object leaving image boundaries + 1 occluded Integer (0,1,2,3) indicating occlusion state: + 0 = fully visible, 1 = partly occluded + 2 = largely occluded, 3 = unknown + 1 alpha Observation angle of object, ranging [-pi..pi] + 4 bbox 2D bounding box of object in the image (0-based index): + contains left, top, right, bottom pixel coordinates + 3 dimensions 3D object dimensions: height, width, length (in meters) + 3 location 3D object location x,y,z in camera coordinates (in meters) + 1 rotation_y Rotation ry around Y-axis in camera coordinates [-pi..pi] + 1 score Only for results: Float, indicating confidence in + detection, needed for p/r curves, higher is better. +``` + +Assume we use the Waymo dataset. +After downloading the data, we need to implement a function to convert both the input data and annotation format into the KITTI style. Then we can implement WaymoDataset inherited from KittiDataset to load the data and perform training and evaluation. + +Specifically, we implement a waymo [converter](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/data_converter/waymo_converter.py) to convert Waymo data into KITTI format and a waymo dataset [class](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/waymo_dataset.py) to process it. Because we preprocess the raw data and reorganize it like KITTI, the dataset class could be implemented more easily by inheriting from KittiDataset. The last thing needed to be noted is the evaluation protocol you would like to use. Because Waymo has its own evaluation approach, we further incorporate it into our dataset class. Afterwards, users can successfully convert the data format and use `WaymoDataset` to train and evaluate the model. + ## Prepare a config +The second step is to prepare configs such that the dataset could be successfully loaded. In addition, adjusting hyperparameters is usually necessary to obtain decent performance in 3D detection. + +Suppose we would like to train PointPillars on Waymo to achieve 3D detection for 3 classes, vehilce, cyclist and pedestrian, we need to prepare dataset config like [this](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/waymo_dataset.py), model config like [this](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/models/hv_pointpillars_secfpn_waymo.py) and combine them like [this](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-3class.py), compared to KITTI [dataset config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/kitti-3d-3class.py), [model config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/models/hv_pointpillars_secfpn_kitti.py) and [overall](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py). + ## Train a new model +To train a model with the new config, you can simply run + +```shell +python tools/train.py configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-3class.py +``` + +For more detailed usages, please refer to the [Case 1](1_exist_data_model.md). + ## Test and inference + +To test the trained model, you can simply run + +```shell +python tools/test.py configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-3class.py work_dirs/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-3class/latest.pth --eval waymo +``` + +**Note**: To use Waymo evaluation protocol, you need to follow the [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/tutorials/waymo.md) and prepare files related to metrics computation as official instructions. + +For more detailed usages for test and inference, please refer to the [Case 1](1_exist_data_model.md). From 8fb92d12665e1f77f3d25c1e599591af9bc1877e Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 20:02:24 +0800 Subject: [PATCH 15/43] Update Tutorial 1 --- docs/tutorials/config.md | 466 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 466 insertions(+) create mode 100644 docs/tutorials/config.md diff --git a/docs/tutorials/config.md b/docs/tutorials/config.md new file mode 100644 index 0000000000..41cb1a4e12 --- /dev/null +++ b/docs/tutorials/config.md @@ -0,0 +1,466 @@ +# Tutorial 1: Learn about Configs + +We incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments. +If you wish to inspect the config file, you may run `python tools/print_config.py /PATH/TO/CONFIG` to see the complete config. +You may also pass `--options xxx.yyy=zzz` to see updated config. + +## Config File Structure + +There are 4 basic component types under `config/_base_`, dataset, model, schedule, default_runtime. +Many methods could be easily constructed with one of each like SECOND, PointPillars, PartA2, and VoteNet. +The configs that are composed by components from `_base_` are called _primitive_. + +For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3. + +For easy understanding, we recommend contributors to inherit from exiting methods. +For example, if some modification is made based on PointPillars, user may first inherit the basic PointPillars structure by specifying `_base_ = ../pointpillars/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d.py`, then modify the necessary fields in the config files. + +If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder `xxx_rcnn` under `configs`, + +Please refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#config) for detailed documentation. + +## Config Name Style + +We follow the below style to name config files. Contributors are advised to follow the same style. + +``` +{model}_[model setting]_{backbone}_{neck}_[norm setting]_[misc]_[gpu x batch_per_gpu]_{schedule}_{dataset} +``` + +`{xxx}` is required field and `[yyy]` is optional. + +- `{model}`: model type like `hv_pointpillars` (Hard Voxelization PointPillars), `VoteNet`, etc. +- `[model setting]`: specific setting for some model. +- `{backbone}`: backbone type like `regnet-400mf`, `regnet-1.6gf`. +- `{neck}`: neck type like `fpn`, `secfpn`. +- `[norm_setting]`: `bn` (Batch Normalization) is used unless specified, other norm layer type could be `gn` (Group Normalization), `sbn` (Synchronized Batch Normalization). +`gn-head`/`gn-neck` indicates GN is applied in head/neck only, while `gn-all` means GN is applied in the entire model, e.g. backbone, neck, head. +- `[misc]`: miscellaneous setting/plugins of model, e.g. `strong-aug` means using stronger augmentation strategies for training. +- `[batch_per_gpu x gpu]`: GPUs and samples per GPU, `4x8` is used by default. +- `{schedule}`: training schedule, options are `1x`, `2x`, `20e`, etc. +`1x` and `2x` means 12 epochs and 24 epochs respectively. +`20e` is adopted in cascade models, which denotes 20 epochs. +For `1x`/`2x`, initial learning rate decays by a factor of 10 at the 8/16th and 11/22th epochs. +For `20e`, initial learning rate decays by a factor of 10 at the 16th and 19th epochs. +- `{dataset}`: dataset like `nus-3d`, `kitti-3d`, `lyft-3d`, `scannet-3d`, `sunrgbd-3d`. We also indicate the number of classes we are using if there exist multiple settings, e.g., `kitti-3d-3class` and `kitti-3d-car` means training on KITTI dataset with 3 classes and single class, respectively. + +## An example of VoteNet + +```python +model = dict( + type='VoteNet', # The type of detector, refer to mmdet3d.models.detectors for more details + backbone=dict( + type='PointNet2SASSG', # The type of the backbone, refer to mmdet3d.models.backbones for more details + in_channels=4, # Input channels of point cloud + num_points=(2048, 1024, 512, 256), # The number of points which each SA module samples + radius=(0.2, 0.4, 0.8, 1.2), # Radius for each set abstraction layer + num_samples=(64, 32, 16, 16), # Number of samples for each set abstraction layer + sa_channels=((64, 64, 128), (128, 128, 256), (128, 128, 256), + (128, 128, 256)), # Out channels of each mlp in SA module + fp_channels=((256, 256), (256, 256)), # Out channels of each mlp in FP module + norm_cfg=dict(type='BN2d'), # Config of normalization layer + sa_cfg=dict( # Config of point set abstraction (SA) module + type='PointSAModule', # type of SA module + pool_mod='max', # Pool method ('max' or 'avg') for SA modules + use_xyz=True, # Whether to use xyz as features during feature gathering + normalize_xyz=True)), # Whether to use normalized xyz as feature during feature gathering + bbox_head=dict( + type='VoteHead', # The type of bbox head, refer to mmdet3d.models.dense_heads for more details + num_classes=18, # Number of classes for classification + bbox_coder=dict( + type='PartialBinBasedBBoxCoder', # The type of bbox_coder, refer to mmdet3d.core.bbox.coders for more details + num_sizes=18, # Number of size clusters + num_dir_bins=1, # Number of bins to encode direction angle + with_rot=False, # Whether the bbox is with rotation + mean_sizes=[[0.76966727, 0.8116021, 0.92573744], + [1.876858, 1.8425595, 1.1931566], + [0.61328, 0.6148609, 0.7182701], + [1.3955007, 1.5121545, 0.83443564], + [0.97949594, 1.0675149, 0.6329687], + [0.531663, 0.5955577, 1.7500148], + [0.9624706, 0.72462326, 1.1481868], + [0.83221924, 1.0490936, 1.6875663], + [0.21132214, 0.4206159, 0.5372846], + [1.4440073, 1.8970833, 0.26985747], + [1.0294262, 1.4040797, 0.87554324], + [1.3766412, 0.65521795, 1.6813129], + [0.6650819, 0.71111923, 1.298853], + [0.41999173, 0.37906948, 1.7513971], + [0.59359556, 0.5912492, 0.73919016], + [0.50867593, 0.50656086, 0.30136237], + [1.1511526, 1.0546296, 0.49706793], + [0.47535285, 0.49249494, 0.5802117]]), # Mean sizes for each class, the order is consistent with class_names. + vote_moudule_cfg=dict( # Config to vote module branch, refer to mmdet3d.models.model_utils for more details + in_channels=256, # Input channels for vote_module + vote_per_seed=1, # Number of votes to generate for each seed + gt_per_seed=3, # Number of gts for each seed + conv_channels=(256, 256), # Channels for convolution + conv_cfg=dict(type='Conv1d'), # Config to convolution + norm_cfg=dict(type='BN1d'), # Config to normalization + norm_feats=True, # Whether to normalize features + vote_loss=dict( # Config to the loss function for voting branch + type='ChamferDistance', # Type of loss for voting branch + mode='l1', # Loss mode of voting branch + reduction='none', # Specifies the reduction to apply to the output + loss_dst_weight=10.0)), # Destination loss weight of the voting branch + vote_aggregation_cfg=dict( # Config to vote aggregation branch + type='PointSAModule', # type of vote aggregation module + num_point=256, # Number of points for the set abstraction layer in vote aggregation branch + radius=0.3, # Radius for the set abstraction layer in vote aggregation branch + num_sample=16, # Number of samples for the set abstraction layer in vote aggregation branch + mlp_channels=[256, 128, 128, 128], # Mlp channels for the set abstraction layer in vote aggregation branch + use_xyz=True, # Whether to use xyz + normalize_xyz=True), # Whether to normalize xyz + feat_channels=(128, 128), # Channels for feature convolution + conv_cfg=dict(type='Conv1d'), # Config to convolution + norm_cfg=dict(type='BN1d'), # Config to normalization + objectness_loss=dict( # Config to objectness loss + type='CrossEntropyLoss', # Type of loss + class_weight=[0.2, 0.8], # Class weight of the objectness loss + reduction='sum', # Specifies the reduction to apply to the output + loss_weight=5.0), # Loss weight of the objectness loss + center_loss=dict( # Config to center loss + type='ChamferDistance', # Type of loss + mode='l2', # Loss mode of center loss + reduction='sum', # Specifies the reduction to apply to the output + loss_src_weight=10.0, # Source loss weight of the voting branch. + loss_dst_weight=10.0), # Destination loss weight of the voting branch. + dir_class_loss=dict( # Config to direction classification loss + type='CrossEntropyLoss', # Type of loss + reduction='sum', # Specifies the reduction to apply to the output + loss_weight=1.0), # Loss weight of the direction classification loss + dir_res_loss=dict( # Config to direction residual loss + type='SmoothL1Loss', # Type of loss + reduction='sum', # Specifies the reduction to apply to the output + loss_weight=10.0), # Loss weight of the direction residual loss + size_class_loss=dict( # Config to size classification loss + type='CrossEntropyLoss', # Type of loss + reduction='sum', # Specifies the reduction to apply to the output + loss_weight=1.0), # Loss weight of the size classification loss + size_res_loss=dict( # Config to size residual loss + type='SmoothL1Loss', # Type of loss + reduction='sum', # Specifies the reduction to apply to the output + loss_weight=3.3333333333333335), # Loss weight of the size residual loss + semantic_loss=dict( # Config to semantic loss + type='CrossEntropyLoss', # Type of loss + reduction='sum', # Specifies the reduction to apply to the output + loss_weight=1.0))) # Loss weight of the semantic loss +train_cfg = dict( # Config of training hyperparameters for votenet + pos_distance_thr=0.3, # distance >= threshold 0.3 will be taken as positive samples + neg_distance_thr=0.6, # distance < threshold 0.6 will be taken as positive samples + sample_mod='vote') # Mode of the sampling method +test_cfg = dict( # Config of testing hyperparameters for votenet + sample_mod='seed', # Mode of the sampling method + nms_thr=0.25, # The threshold to be used during NMS + score_thr=0.8, # Threshold to filter out boxes + per_class_proposal=False) # Whether to use per_class_proposal +dataset_type = 'ScanNetDataset' # Type of the dataset +data_root = './data/scannet/' # Root path of the data +class_names = ('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window', + 'bookshelf', 'picture', 'counter', 'desk', 'curtain', + 'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub', + 'garbagebin') # Names of classes +train_pipeline = [ # Training pipeline, refer to mmdet3d.datasets.pipelines for more details + dict( + type='LoadPointsFromFile', # First pipeline to load points, refer to mmdet3d.datasets.pipelines.indoor_loading for more details + shift_height=True, # Whether to use shifted height + load_dim=6, # The dimension of the loaded points + use_dim=[0, 1, 2]), # Which dimensions of the points to be used + dict( + type='LoadAnnotations3D', # Second pipeline to load annotations, refer to mmdet3d.datasets.pipelines.indoor_loading for more details + with_bbox_3d=True, # Whether to load 3D boxes + with_label_3d=True, # Whether to load 3D labels + with_mask_3d=True, # Whether to load 3D instance masks + with_seg_3d=True), # Whether to load 3D semantic masks + dict( + type='PointSegClassMapping', # Declare valid categories, refer to mmdet3d.datasets.pipelines.point_seg_class_mapping for more details + valid_cat_ids=(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34, + 36, 39)), + dict(type='IndoorPointSample', # Sample indoor points, refer to mmdet3d.datasets.pipelines.indoor_sample for more details + num_points=40000), # Number of points to be sampled + dict(type='IndoorFlipData', # Augmentation pipeline that flip points and 3d boxes + flip_ratio_yz=0.5, # Probability of being flipped along yz plane + flip_ratio_xz=0.5), # Probability of being flipped along xz plane + dict( + type='IndoorGlobalRotScale', # Augmentation pipeline that rotate and scale points and 3d boxes, refer to mmdet3d.datasets.pipelines.indoor_augment for more details + shift_height=True, # Whether to use height + rot_range=[-0.027777777777777776, 0.027777777777777776], # Range of rotation + scale_range=None), # Range of scale + dict( + type='DefaultFormatBundle3D', # Default format bundle to gather data in the pipeline, refer to mmdet3d.datasets.pipelines.formating for more details + class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', + 'window', 'bookshelf', 'picture', 'counter', 'desk', + 'curtain', 'refrigerator', 'showercurtrain', 'toilet', + 'sink', 'bathtub', 'garbagebin')), + dict( + type='Collect3D', # Pipeline that decides which keys in the data should be passed to the detector, refer to mmdet3d.datasets.pipelines.formating for more details + keys=[ + 'points', 'gt_bboxes_3d', 'gt_labels_3d', 'pts_semantic_mask', + 'pts_instance_mask' + ]) +] +test_pipeline = [ # Testing pipeline, refer to mmdet3d.datasets.pipelines for more details + dict( + type='LoadPointsFromFile', # First pipeline to load points, refer to mmdet3d.datasets.pipelines.indoor_loading for more details + shift_height=True, # Whether to use shifted height + load_dim=6, # The dimension of the loaded points + use_dim=[0, 1, 2]), # Which dimensions of the points to be used + dict(type='IndoorPointSample', # Sample indoor points, refer to mmdet3d.datasets.pipelines.indoor_sample for more details + num_points=40000), # Number of points to be sampled + dict( + type='DefaultFormatBundle3D', # Default format bundle to gather data in the pipeline, refer to mmdet3d.datasets.pipelines.formating for more details + class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', + 'window', 'bookshelf', 'picture', 'counter', 'desk', + 'curtain', 'refrigerator', 'showercurtrain', 'toilet', + 'sink', 'bathtub', 'garbagebin')), + dict(type='Collect3D', # Pipeline that decides which keys in the data should be passed to the detector, refer to mmdet3d.datasets.pipelines.formating for more details + keys=['points']) +] +data = dict( + samples_per_gpu=8, # Batch size of a single GPU + workers_per_gpu=4, # Worker to pre-fetch data for each single GPU + train=dict( # Train dataset config + type='RepeatDataset', # Wrapper of dataset, refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/dataset_wrappers.py for details. + times=5, # Repeat times + dataset=dict( + type='ScanNetDataset', # Type of dataset + data_root='./data/scannet/', # Root path of the data + ann_file='./data/scannet/scannet_infos_train.pkl', # Ann path of the data + pipeline=[ # pipeline, this is passed by the train_pipeline created before. + dict( + type='LoadPointsFromFile', + shift_height=True, + load_dim=6, + use_dim=[0, 1, 2]), + dict( + type='LoadAnnotations3D', + with_bbox_3d=True, + with_label_3d=True, + with_mask_3d=True, + with_seg_3d=True), + dict( + type='PointSegClassMapping', + valid_cat_ids=(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, + 28, 33, 34, 36, 39)), + dict(type='IndoorPointSample', num_points=40000), + dict( + type='IndoorFlipData', + flip_ratio_yz=0.5, + flip_ratio_xz=0.5), + dict( + type='IndoorGlobalRotScale', + shift_height=True, + rot_range=[-0.027777777777777776, 0.027777777777777776], + scale_range=None), + dict( + type='DefaultFormatBundle3D', + class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', + 'door', 'window', 'bookshelf', 'picture', + 'counter', 'desk', 'curtain', 'refrigerator', + 'showercurtrain', 'toilet', 'sink', 'bathtub', + 'garbagebin')), + dict( + type='Collect3D', + keys=[ + 'points', 'gt_bboxes_3d', 'gt_labels_3d', + 'pts_semantic_mask', 'pts_instance_mask' + ]) + ], + filter_empty_gt=False, # Whether to filter ground empty truth boxes + classes=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', + 'window', 'bookshelf', 'picture', 'counter', 'desk', + 'curtain', 'refrigerator', 'showercurtrain', 'toilet', + 'sink', 'bathtub', 'garbagebin'))), # Names of classes + val=dict( # Validation dataset config + type='ScanNetDataset', # Type of dataset + data_root='./data/scannet/', # Root path of the data + ann_file='./data/scannet/scannet_infos_val.pkl', # Ann path of the data + pipeline=[ # Pipeline is passed by test_pipeline created before + dict( + type='LoadPointsFromFile', + shift_height=True, + load_dim=6, + use_dim=[0, 1, 2]), + dict(type='IndoorPointSample', num_points=40000), + dict( + type='DefaultFormatBundle3D', + class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', + 'door', 'window', 'bookshelf', 'picture', + 'counter', 'desk', 'curtain', 'refrigerator', + 'showercurtrain', 'toilet', 'sink', 'bathtub', + 'garbagebin')), + dict(type='Collect3D', keys=['points']) + ], + classes=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window', + 'bookshelf', 'picture', 'counter', 'desk', 'curtain', + 'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub', + 'garbagebin'), # Names of classes + test_mode=True), # Whether to use test mode + test=dict( # Test dataset config + type='ScanNetDataset', # Type of dataset + data_root='./data/scannet/', # Root path of the data + ann_file='./data/scannet/scannet_infos_val.pkl', # Ann path of the data + pipeline=[ # Pipeline is passed by test_pipeline created before + dict( + type='LoadPointsFromFile', + shift_height=True, + load_dim=6, + use_dim=[0, 1, 2]), + dict(type='IndoorPointSample', num_points=40000), + dict( + type='DefaultFormatBundle3D', + class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', + 'door', 'window', 'bookshelf', 'picture', + 'counter', 'desk', 'curtain', 'refrigerator', + 'showercurtrain', 'toilet', 'sink', 'bathtub', + 'garbagebin')), + dict(type='Collect3D', keys=['points']) + ], + classes=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window', + 'bookshelf', 'picture', 'counter', 'desk', 'curtain', + 'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub', + 'garbagebin'), # Names of classes + test_mode=True)) # Whether to use test mode +lr = 0.008 # Learning rate of optimizers +optimizer = dict( # Config used to build optimizer, support all the optimizers in PyTorch whose arguments are also the same as those in PyTorch + type='Adam', # Type of optimizers, # Type of optimizers, refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/optimizer/default_constructor.py#L13 for more details + lr=0.008) # Learning rate of optimizers, see detail usages of the parameters in the documentaion of PyTorch +optimizer_config = dict( # Config used to build the optimizer hook, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/optimizer.py#L8 for implementation details. + grad_clip=dict( # Config used to grad_clip + max_norm=10, # max norm of the gradients + norm_type=2)) # Type of the used p-norm. Can be 'inf' for infinity norm. +lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook + policy='step', # The policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9. + warmup=None, # The warmup policy, also support `exp` and `constant`. + step=[24, 32]) # Steps to decay the learning rate +checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation. + interval=1) # The save interval is 1 +log_config = dict( # config to register logger hook + interval=50, # Interval to print the log + hooks=[dict(type='TextLoggerHook'), + dict(type='TensorboardLoggerHook')]) # The logger used to record the training process. +total_epochs = 36 # Total epochs to train the model +dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set. +log_level = 'INFO' # The level of logging. +find_unused_parameters = True # Whether to find unused parameters +work_dir = None # Directory to save the model checkpoints and logs for the current experiments. +load_from = None # load models as a pre-trained model from a given path. This will not resume training. +resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved. +workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once. The workflow trains the model by 36 epochs according to the total_epochs. +gpu_ids = range(0, 1) # ids of gpus +``` + +## FAQ + +### Ignore some fields in the base configs + +Sometimes, you may set `_delete_=True` to ignore some of fields in base configs. +You may refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#inherit-from-base-config-with-ignored-fields) for simple inllustration. + +In MMDetection or MMDetection3D, for example, to change the FPN neck of PointPillars with the following config. + +```python +model = dict( + type='MVXFasterRCNN', + pts_voxel_layer=dict(...), + pts_voxel_encoder=dict(...), + pts_middle_encoder=dict(...), + pts_backbone=dict(...), + pts_neck=dict( + type='FPN', + norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01), + act_cfg=dict(type='ReLU'), + in_channels=[64, 128, 256], + out_channels=256, + start_level=0, + num_outs=3), + pts_bbox_head=dict(...)) +``` + +`FPN` and `SECONDFPN` use different keywords to construct. + +```python +_base_ = '../_base_/models/hv_pointpillars_fpn_nus.py' +model = dict( + pts_neck=dict( + _delete_=True, + type='SECONDFPN', + norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01), + in_channels=[64, 128, 256], + upsample_strides=[1, 2, 4], + out_channels=[128, 128, 128]), + pts_bbox_head=dict(...)) +``` + +The `_delete_=True` would replace all old keys in `pts_neck` field with new keys. + +### Use intermediate variables in configs + +Some intermediate variables are used in the configs files, like `train_pipeline`/`test_pipeline` in datasets. +It's worth noting that when modifying intermediate variables in the children configs, user needs to pass the intermediate variables into corresponding fields again. +For example, we would like to use multi scale strategy to train and test a PointPillars. `train_pipeline`/`test_pipeline` are intermediate variable we would like modify. + +```python +_base_ = './nus-3d.py' +train_pipeline = [ + dict( + type='LoadPointsFromFile', + load_dim=5, + use_dim=5, + file_client_args=file_client_args), + dict( + type='LoadPointsFromMultiSweeps', + sweeps_num=10, + file_client_args=file_client_args), + dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True), + dict( + type='GlobalRotScaleTrans', + rot_range=[-0.3925, 0.3925], + scale_ratio_range=[0.95, 1.05], + translation_std=[0, 0, 0]), + dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), + dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range), + dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), + dict(type='ObjectNameFilter', classes=class_names), + dict(type='PointShuffle'), + dict(type='DefaultFormatBundle3D', class_names=class_names), + dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d']) +] +test_pipeline = [ + dict( + type='LoadPointsFromFile', + load_dim=5, + use_dim=5, + file_client_args=file_client_args), + dict( + type='LoadPointsFromMultiSweeps', + sweeps_num=10, + file_client_args=file_client_args), + dict( + type='MultiScaleFlipAug3D', + img_scale=(1333, 800), + pts_scale_ratio=[0.95, 1.0, 1.05], + flip=False, + transforms=[ + dict( + type='GlobalRotScaleTrans', + rot_range=[0, 0], + scale_ratio_range=[1., 1.], + translation_std=[0, 0, 0]), + dict(type='RandomFlip3D'), + dict( + type='PointsRangeFilter', point_cloud_range=point_cloud_range), + dict( + type='DefaultFormatBundle3D', + class_names=class_names, + with_label=False), + dict(type='Collect3D', keys=['points']) + ]) +] +data = dict( + train=dict(pipeline=train_pipeline), + val=dict(pipeline=test_pipeline), + test=dict(pipeline=test_pipeline)) +``` + +We first define the new `train_pipeline`/`test_pipeline` and pass them into `data`. From 61687520abd492f4c84bcd0c0e0b068c78afb255 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 20:02:46 +0800 Subject: [PATCH 16/43] Moved into Tutorials --- docs/config.md | 465 ------------------------------------------------- 1 file changed, 465 deletions(-) delete mode 100644 docs/config.md diff --git a/docs/config.md b/docs/config.md deleted file mode 100644 index 4ee31c6443..0000000000 --- a/docs/config.md +++ /dev/null @@ -1,465 +0,0 @@ -# Config System -We incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments. -If you wish to inspect the config file, you may run `python tools/print_config.py /PATH/TO/CONFIG` to see the complete config. -You may also pass `--options xxx.yyy=zzz` to see updated config. - -## Config File Structure - -There are 4 basic component types under `config/_base_`, dataset, model, schedule, default_runtime. -Many methods could be easily constructed with one of each like SECOND, PointPillars, PartA2, and VoteNet. -The configs that are composed by components from `_base_` are called _primitive_. - -For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3. - -For easy understanding, we recommend contributors to inherit from exiting methods. -For example, if some modification is made based on PointPillars, user may first inherit the basic PointPillars structure by specifying `_base_ = ../pointpillars/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d.py`, then modify the necessary fields in the config files. - -If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder `xxx_rcnn` under `configs`, - -Please refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#config) for detailed documentation. - -## Config Name Style - -We follow the below style to name config files. Contributors are advised to follow the same style. - -``` -{model}_[model setting]_{backbone}_{neck}_[norm setting]_[misc]_[gpu x batch_per_gpu]_{schedule}_{dataset} -``` - -`{xxx}` is required field and `[yyy]` is optional. - -- `{model}`: model type like `hv_pointpillars` (Hard Voxelization PointPillars), `VoteNet`, etc. -- `[model setting]`: specific setting for some model. -- `{backbone}`: backbone type like `regnet-400mf`, `regnet-1.6gf`. -- `{neck}`: neck type like `fpn`, `secfpn`. -- `[norm_setting]`: `bn` (Batch Normalization) is used unless specified, other norm layer type could be `gn` (Group Normalization), `sbn` (Synchronized Batch Normalization). -`gn-head`/`gn-neck` indicates GN is applied in head/neck only, while `gn-all` means GN is applied in the entire model, e.g. backbone, neck, head. -- `[misc]`: miscellaneous setting/plugins of model, e.g. `strong-aug` means using stronger augmentation strategies for training. -- `[batch_per_gpu x gpu]`: GPUs and samples per GPU, `4x8` is used by default. -- `{schedule}`: training schedule, options are `1x`, `2x`, `20e`, etc. -`1x` and `2x` means 12 epochs and 24 epochs respectively. -`20e` is adopted in cascade models, which denotes 20 epochs. -For `1x`/`2x`, initial learning rate decays by a factor of 10 at the 8/16th and 11/22th epochs. -For `20e`, initial learning rate decays by a factor of 10 at the 16th and 19th epochs. -- `{dataset}`: dataset like `nus-3d`, `kitti-3d`, `lyft-3d`, `scannet-3d`, `sunrgbd-3d`. We also indicate the number of classes we are using if there exist multiple settings, e.g., `kitti-3d-3class` and `kitti-3d-car` means training on KITTI dataset with 3 classes and single class, respectively. - -## An example of VoteNet - -```python -model = dict( - type='VoteNet', # The type of detector, refer to mmdet3d.models.detectors for more details - backbone=dict( - type='PointNet2SASSG', # The type of the backbone, refer to mmdet3d.models.backbones for more details - in_channels=4, # Input channels of point cloud - num_points=(2048, 1024, 512, 256), # The number of points which each SA module samples - radius=(0.2, 0.4, 0.8, 1.2), # Radius for each set abstraction layer - num_samples=(64, 32, 16, 16), # Number of samples for each set abstraction layer - sa_channels=((64, 64, 128), (128, 128, 256), (128, 128, 256), - (128, 128, 256)), # Out channels of each mlp in SA module - fp_channels=((256, 256), (256, 256)), # Out channels of each mlp in FP module - norm_cfg=dict(type='BN2d'), # Config of normalization layer - sa_cfg=dict( # Config of point set abstraction (SA) module - type='PointSAModule', # type of SA module - pool_mod='max', # Pool method ('max' or 'avg') for SA modules - use_xyz=True, # Whether to use xyz as features during feature gathering - normalize_xyz=True)), # Whether to use normalized xyz as feature during feature gathering - bbox_head=dict( - type='VoteHead', # The type of bbox head, refer to mmdet3d.models.dense_heads for more details - num_classes=18, # Number of classes for classification - bbox_coder=dict( - type='PartialBinBasedBBoxCoder', # The type of bbox_coder, refer to mmdet3d.core.bbox.coders for more details - num_sizes=18, # Number of size clusters - num_dir_bins=1, # Number of bins to encode direction angle - with_rot=False, # Whether the bbox is with rotation - mean_sizes=[[0.76966727, 0.8116021, 0.92573744], - [1.876858, 1.8425595, 1.1931566], - [0.61328, 0.6148609, 0.7182701], - [1.3955007, 1.5121545, 0.83443564], - [0.97949594, 1.0675149, 0.6329687], - [0.531663, 0.5955577, 1.7500148], - [0.9624706, 0.72462326, 1.1481868], - [0.83221924, 1.0490936, 1.6875663], - [0.21132214, 0.4206159, 0.5372846], - [1.4440073, 1.8970833, 0.26985747], - [1.0294262, 1.4040797, 0.87554324], - [1.3766412, 0.65521795, 1.6813129], - [0.6650819, 0.71111923, 1.298853], - [0.41999173, 0.37906948, 1.7513971], - [0.59359556, 0.5912492, 0.73919016], - [0.50867593, 0.50656086, 0.30136237], - [1.1511526, 1.0546296, 0.49706793], - [0.47535285, 0.49249494, 0.5802117]]), # Mean sizes for each class, the order is consistent with class_names. - vote_moudule_cfg=dict( # Config to vote module branch, refer to mmdet3d.models.model_utils for more details - in_channels=256, # Input channels for vote_module - vote_per_seed=1, # Number of votes to generate for each seed - gt_per_seed=3, # Number of gts for each seed - conv_channels=(256, 256), # Channels for convolution - conv_cfg=dict(type='Conv1d'), # Config to convolution - norm_cfg=dict(type='BN1d'), # Config to normalization - norm_feats=True, # Whether to normalize features - vote_loss=dict( # Config to the loss function for voting branch - type='ChamferDistance', # Type of loss for voting branch - mode='l1', # Loss mode of voting branch - reduction='none', # Specifies the reduction to apply to the output - loss_dst_weight=10.0)), # Destination loss weight of the voting branch - vote_aggregation_cfg=dict( # Config to vote aggregation branch - type='PointSAModule', # type of vote aggregation module - num_point=256, # Number of points for the set abstraction layer in vote aggregation branch - radius=0.3, # Radius for the set abstraction layer in vote aggregation branch - num_sample=16, # Number of samples for the set abstraction layer in vote aggregation branch - mlp_channels=[256, 128, 128, 128], # Mlp channels for the set abstraction layer in vote aggregation branch - use_xyz=True, # Whether to use xyz - normalize_xyz=True), # Whether to normalize xyz - feat_channels=(128, 128), # Channels for feature convolution - conv_cfg=dict(type='Conv1d'), # Config to convolution - norm_cfg=dict(type='BN1d'), # Config to normalization - objectness_loss=dict( # Config to objectness loss - type='CrossEntropyLoss', # Type of loss - class_weight=[0.2, 0.8], # Class weight of the objectness loss - reduction='sum', # Specifies the reduction to apply to the output - loss_weight=5.0), # Loss weight of the objectness loss - center_loss=dict( # Config to center loss - type='ChamferDistance', # Type of loss - mode='l2', # Loss mode of center loss - reduction='sum', # Specifies the reduction to apply to the output - loss_src_weight=10.0, # Source loss weight of the voting branch. - loss_dst_weight=10.0), # Destination loss weight of the voting branch. - dir_class_loss=dict( # Config to direction classification loss - type='CrossEntropyLoss', # Type of loss - reduction='sum', # Specifies the reduction to apply to the output - loss_weight=1.0), # Loss weight of the direction classification loss - dir_res_loss=dict( # Config to direction residual loss - type='SmoothL1Loss', # Type of loss - reduction='sum', # Specifies the reduction to apply to the output - loss_weight=10.0), # Loss weight of the direction residual loss - size_class_loss=dict( # Config to size classification loss - type='CrossEntropyLoss', # Type of loss - reduction='sum', # Specifies the reduction to apply to the output - loss_weight=1.0), # Loss weight of the size classification loss - size_res_loss=dict( # Config to size residual loss - type='SmoothL1Loss', # Type of loss - reduction='sum', # Specifies the reduction to apply to the output - loss_weight=3.3333333333333335), # Loss weight of the size residual loss - semantic_loss=dict( # Config to semantic loss - type='CrossEntropyLoss', # Type of loss - reduction='sum', # Specifies the reduction to apply to the output - loss_weight=1.0))) # Loss weight of the semantic loss -train_cfg = dict( # Config of training hyperparameters for votenet - pos_distance_thr=0.3, # distance >= threshold 0.3 will be taken as positive samples - neg_distance_thr=0.6, # distance < threshold 0.6 will be taken as positive samples - sample_mod='vote') # Mode of the sampling method -test_cfg = dict( # Config of testing hyperparameters for votenet - sample_mod='seed', # Mode of the sampling method - nms_thr=0.25, # The threshold to be used during NMS - score_thr=0.8, # Threshold to filter out boxes - per_class_proposal=False) # Whether to use per_class_proposal -dataset_type = 'ScanNetDataset' # Type of the dataset -data_root = './data/scannet/' # Root path of the data -class_names = ('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window', - 'bookshelf', 'picture', 'counter', 'desk', 'curtain', - 'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub', - 'garbagebin') # Names of classes -train_pipeline = [ # Training pipeline, refer to mmdet3d.datasets.pipelines for more details - dict( - type='LoadPointsFromFile', # First pipeline to load points, refer to mmdet3d.datasets.pipelines.indoor_loading for more details - shift_height=True, # Whether to use shifted height - load_dim=6, # The dimension of the loaded points - use_dim=[0, 1, 2]), # Which dimensions of the points to be used - dict( - type='LoadAnnotations3D', # Second pipeline to load annotations, refer to mmdet3d.datasets.pipelines.indoor_loading for more details - with_bbox_3d=True, # Whether to load 3D boxes - with_label_3d=True, # Whether to load 3D labels - with_mask_3d=True, # Whether to load 3D instance masks - with_seg_3d=True), # Whether to load 3D semantic masks - dict( - type='PointSegClassMapping', # Declare valid categories, refer to mmdet3d.datasets.pipelines.point_seg_class_mapping for more details - valid_cat_ids=(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34, - 36, 39)), - dict(type='IndoorPointSample', # Sample indoor points, refer to mmdet3d.datasets.pipelines.indoor_sample for more details - num_points=40000), # Number of points to be sampled - dict(type='IndoorFlipData', # Augmentation pipeline that flip points and 3d boxes - flip_ratio_yz=0.5, # Probability of being flipped along yz plane - flip_ratio_xz=0.5), # Probability of being flipped along xz plane - dict( - type='IndoorGlobalRotScale', # Augmentation pipeline that rotate and scale points and 3d boxes, refer to mmdet3d.datasets.pipelines.indoor_augment for more details - shift_height=True, # Whether to use height - rot_range=[-0.027777777777777776, 0.027777777777777776], # Range of rotation - scale_range=None), # Range of scale - dict( - type='DefaultFormatBundle3D', # Default format bundle to gather data in the pipeline, refer to mmdet3d.datasets.pipelines.formating for more details - class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', - 'window', 'bookshelf', 'picture', 'counter', 'desk', - 'curtain', 'refrigerator', 'showercurtrain', 'toilet', - 'sink', 'bathtub', 'garbagebin')), - dict( - type='Collect3D', # Pipeline that decides which keys in the data should be passed to the detector, refer to mmdet3d.datasets.pipelines.formating for more details - keys=[ - 'points', 'gt_bboxes_3d', 'gt_labels_3d', 'pts_semantic_mask', - 'pts_instance_mask' - ]) -] -test_pipeline = [ # Testing pipeline, refer to mmdet3d.datasets.pipelines for more details - dict( - type='LoadPointsFromFile', # First pipeline to load points, refer to mmdet3d.datasets.pipelines.indoor_loading for more details - shift_height=True, # Whether to use shifted height - load_dim=6, # The dimension of the loaded points - use_dim=[0, 1, 2]), # Which dimensions of the points to be used - dict(type='IndoorPointSample', # Sample indoor points, refer to mmdet3d.datasets.pipelines.indoor_sample for more details - num_points=40000), # Number of points to be sampled - dict( - type='DefaultFormatBundle3D', # Default format bundle to gather data in the pipeline, refer to mmdet3d.datasets.pipelines.formating for more details - class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', - 'window', 'bookshelf', 'picture', 'counter', 'desk', - 'curtain', 'refrigerator', 'showercurtrain', 'toilet', - 'sink', 'bathtub', 'garbagebin')), - dict(type='Collect3D', # Pipeline that decides which keys in the data should be passed to the detector, refer to mmdet3d.datasets.pipelines.formating for more details - keys=['points']) -] -data = dict( - samples_per_gpu=8, # Batch size of a single GPU - workers_per_gpu=4, # Worker to pre-fetch data for each single GPU - train=dict( # Train dataset config - type='RepeatDataset', # Wrapper of dataset, refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/dataset_wrappers.py for details. - times=5, # Repeat times - dataset=dict( - type='ScanNetDataset', # Type of dataset - data_root='./data/scannet/', # Root path of the data - ann_file='./data/scannet/scannet_infos_train.pkl', # Ann path of the data - pipeline=[ # pipeline, this is passed by the train_pipeline created before. - dict( - type='LoadPointsFromFile', - shift_height=True, - load_dim=6, - use_dim=[0, 1, 2]), - dict( - type='LoadAnnotations3D', - with_bbox_3d=True, - with_label_3d=True, - with_mask_3d=True, - with_seg_3d=True), - dict( - type='PointSegClassMapping', - valid_cat_ids=(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, - 28, 33, 34, 36, 39)), - dict(type='IndoorPointSample', num_points=40000), - dict( - type='IndoorFlipData', - flip_ratio_yz=0.5, - flip_ratio_xz=0.5), - dict( - type='IndoorGlobalRotScale', - shift_height=True, - rot_range=[-0.027777777777777776, 0.027777777777777776], - scale_range=None), - dict( - type='DefaultFormatBundle3D', - class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', - 'door', 'window', 'bookshelf', 'picture', - 'counter', 'desk', 'curtain', 'refrigerator', - 'showercurtrain', 'toilet', 'sink', 'bathtub', - 'garbagebin')), - dict( - type='Collect3D', - keys=[ - 'points', 'gt_bboxes_3d', 'gt_labels_3d', - 'pts_semantic_mask', 'pts_instance_mask' - ]) - ], - filter_empty_gt=False, # Whether to filter ground empty truth boxes - classes=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', - 'window', 'bookshelf', 'picture', 'counter', 'desk', - 'curtain', 'refrigerator', 'showercurtrain', 'toilet', - 'sink', 'bathtub', 'garbagebin'))), # Names of classes - val=dict( # Validation dataset config - type='ScanNetDataset', # Type of dataset - data_root='./data/scannet/', # Root path of the data - ann_file='./data/scannet/scannet_infos_val.pkl', # Ann path of the data - pipeline=[ # Pipeline is passed by test_pipeline created before - dict( - type='LoadPointsFromFile', - shift_height=True, - load_dim=6, - use_dim=[0, 1, 2]), - dict(type='IndoorPointSample', num_points=40000), - dict( - type='DefaultFormatBundle3D', - class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', - 'door', 'window', 'bookshelf', 'picture', - 'counter', 'desk', 'curtain', 'refrigerator', - 'showercurtrain', 'toilet', 'sink', 'bathtub', - 'garbagebin')), - dict(type='Collect3D', keys=['points']) - ], - classes=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window', - 'bookshelf', 'picture', 'counter', 'desk', 'curtain', - 'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub', - 'garbagebin'), # Names of classes - test_mode=True), # Whether to use test mode - test=dict( # Test dataset config - type='ScanNetDataset', # Type of dataset - data_root='./data/scannet/', # Root path of the data - ann_file='./data/scannet/scannet_infos_val.pkl', # Ann path of the data - pipeline=[ # Pipeline is passed by test_pipeline created before - dict( - type='LoadPointsFromFile', - shift_height=True, - load_dim=6, - use_dim=[0, 1, 2]), - dict(type='IndoorPointSample', num_points=40000), - dict( - type='DefaultFormatBundle3D', - class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', - 'door', 'window', 'bookshelf', 'picture', - 'counter', 'desk', 'curtain', 'refrigerator', - 'showercurtrain', 'toilet', 'sink', 'bathtub', - 'garbagebin')), - dict(type='Collect3D', keys=['points']) - ], - classes=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window', - 'bookshelf', 'picture', 'counter', 'desk', 'curtain', - 'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub', - 'garbagebin'), # Names of classes - test_mode=True)) # Whether to use test mode -lr = 0.008 # Learning rate of optimizers -optimizer = dict( # Config used to build optimizer, support all the optimizers in PyTorch whose arguments are also the same as those in PyTorch - type='Adam', # Type of optimizers, # Type of optimizers, refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/optimizer/default_constructor.py#L13 for more details - lr=0.008) # Learning rate of optimizers, see detail usages of the parameters in the documentaion of PyTorch -optimizer_config = dict( # Config used to build the optimizer hook, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/optimizer.py#L8 for implementation details. - grad_clip=dict( # Config used to grad_clip - max_norm=10, # max norm of the gradients - norm_type=2)) # Type of the used p-norm. Can be 'inf' for infinity norm. -lr_config = dict( # Learning rate scheduler config used to register LrUpdater hook - policy='step', # The policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9. - warmup=None, # The warmup policy, also support `exp` and `constant`. - step=[24, 32]) # Steps to decay the learning rate -checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation. - interval=1) # The save interval is 1 -log_config = dict( # config to register logger hook - interval=50, # Interval to print the log - hooks=[dict(type='TextLoggerHook'), - dict(type='TensorboardLoggerHook')]) # The logger used to record the training process. -total_epochs = 36 # Total epochs to train the model -dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set. -log_level = 'INFO' # The level of logging. -find_unused_parameters = True # Whether to find unused parameters -work_dir = None # Directory to save the model checkpoints and logs for the current experiments. -load_from = None # load models as a pre-trained model from a given path. This will not resume training. -resume_from = None # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved. -workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once. The workflow trains the model by 36 epochs according to the total_epochs. -gpu_ids = range(0, 1) # ids of gpus -``` - -## FAQ - -### Ignore some fields in the base configs - -Sometimes, you may set `_delete_=True` to ignore some of fields in base configs. -You may refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#inherit-from-base-config-with-ignored-fields) for simple inllustration. - -In MMDetection or MMDetection3D, for example, to change the FPN neck of PointPillars with the following config. - -```python -model = dict( - type='MVXFasterRCNN', - pts_voxel_layer=dict(...), - pts_voxel_encoder=dict(...), - pts_middle_encoder=dict(...), - pts_backbone=dict(...), - pts_neck=dict( - type='FPN', - norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01), - act_cfg=dict(type='ReLU'), - in_channels=[64, 128, 256], - out_channels=256, - start_level=0, - num_outs=3), - pts_bbox_head=dict(...)) -``` - -`FPN` and `SECONDFPN` use different keywords to construct. - -```python -_base_ = '../_base_/models/hv_pointpillars_fpn_nus.py' -model = dict( - pts_neck=dict( - _delete_=True, - type='SECONDFPN', - norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01), - in_channels=[64, 128, 256], - upsample_strides=[1, 2, 4], - out_channels=[128, 128, 128]), - pts_bbox_head=dict(...)) -``` - -The `_delete_=True` would replace all old keys in `pts_neck` field with new keys. - -### Use intermediate variables in configs - -Some intermediate variables are used in the configs files, like `train_pipeline`/`test_pipeline` in datasets. -It's worth noting that when modifying intermediate variables in the children configs, user needs to pass the intermediate variables into corresponding fields again. -For example, we would like to use multi scale strategy to train and test a PointPillars. `train_pipeline`/`test_pipeline` are intermediate variable we would like modify. - -```python -_base_ = './nus-3d.py' -train_pipeline = [ - dict( - type='LoadPointsFromFile', - load_dim=5, - use_dim=5, - file_client_args=file_client_args), - dict( - type='LoadPointsFromMultiSweeps', - sweeps_num=10, - file_client_args=file_client_args), - dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True), - dict( - type='GlobalRotScaleTrans', - rot_range=[-0.3925, 0.3925], - scale_ratio_range=[0.95, 1.05], - translation_std=[0, 0, 0]), - dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), - dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range), - dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), - dict(type='ObjectNameFilter', classes=class_names), - dict(type='PointShuffle'), - dict(type='DefaultFormatBundle3D', class_names=class_names), - dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d']) -] -test_pipeline = [ - dict( - type='LoadPointsFromFile', - load_dim=5, - use_dim=5, - file_client_args=file_client_args), - dict( - type='LoadPointsFromMultiSweeps', - sweeps_num=10, - file_client_args=file_client_args), - dict( - type='MultiScaleFlipAug3D', - img_scale=(1333, 800), - pts_scale_ratio=[0.95, 1.0, 1.05], - flip=False, - transforms=[ - dict( - type='GlobalRotScaleTrans', - rot_range=[0, 0], - scale_ratio_range=[1., 1.], - translation_std=[0, 0, 0]), - dict(type='RandomFlip3D'), - dict( - type='PointsRangeFilter', point_cloud_range=point_cloud_range), - dict( - type='DefaultFormatBundle3D', - class_names=class_names, - with_label=False), - dict(type='Collect3D', keys=['points']) - ]) -] -data = dict( - train=dict(pipeline=train_pipeline), - val=dict(pipeline=test_pipeline), - test=dict(pipeline=test_pipeline)) -``` - -We first define the new `train_pipeline`/`test_pipeline` and pass them into `data`. From 07bb04a2ff47699c7b618cd1c382bcd99eb7e81a Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 20:03:27 +0800 Subject: [PATCH 17/43] Update titles of tutorial 3 --- docs/tutorials/data_pipeline.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/data_pipeline.md b/docs/tutorials/data_pipeline.md index cdcff9b007..5f1b09ecfa 100644 --- a/docs/tutorials/data_pipeline.md +++ b/docs/tutorials/data_pipeline.md @@ -1,4 +1,4 @@ -# Tutorial 3: Custom Data Pipelines +# Tutorial 3: Customize Data Pipelines ## Design of Data pipelines From 3d3951edfa4c7b5d3372c614091fb3143bb5a791 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 20:26:04 +0800 Subject: [PATCH 18/43] Update 2_new_data_model.md --- docs/2_new_data_model.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/2_new_data_model.md b/docs/2_new_data_model.md index 8a2d6facd7..146a32ce43 100644 --- a/docs/2_new_data_model.md +++ b/docs/2_new_data_model.md @@ -73,6 +73,8 @@ After downloading the data, we need to implement a function to convert both the Specifically, we implement a waymo [converter](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/data_converter/waymo_converter.py) to convert Waymo data into KITTI format and a waymo dataset [class](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/waymo_dataset.py) to process it. Because we preprocess the raw data and reorganize it like KITTI, the dataset class could be implemented more easily by inheriting from KittiDataset. The last thing needed to be noted is the evaluation protocol you would like to use. Because Waymo has its own evaluation approach, we further incorporate it into our dataset class. Afterwards, users can successfully convert the data format and use `WaymoDataset` to train and evaluate the model. +For more details about the intermediate results of preprocessing of Waymo dataset, please refer to its [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/tutorials/waymo.md). + ## Prepare a config The second step is to prepare configs such that the dataset could be successfully loaded. In addition, adjusting hyperparameters is usually necessary to obtain decent performance in 3D detection. From 04beb957382d34623fbd0f36bb45b6bb0710b26c Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 20:29:31 +0800 Subject: [PATCH 19/43] Update 1_exist_data_model.md --- docs/1_exist_data_model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/1_exist_data_model.md b/docs/1_exist_data_model.md index a0752c3619..dab1f54041 100644 --- a/docs/1_exist_data_model.md +++ b/docs/1_exist_data_model.md @@ -4,7 +4,7 @@ Here we provide testing scripts to evaluate a whole dataset (SUNRGBD, ScanNet, KITTI, etc.). -For high-level apis easier to integrated into other projects and basic demos, please refer to Demo under [Get Started](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/getting_started.md). +For high-level apis easier to integrated into other projects and basic demos, please refer to Demo under [Get Started](./getting_started.md). ### Test existing models on standard datasets From 94ad94dbac704a30589d90dde941d4d1403e206d Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 20:31:20 +0800 Subject: [PATCH 20/43] Update links to relative paths --- docs/2_new_data_model.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/2_new_data_model.md b/docs/2_new_data_model.md index 146a32ce43..af837b12e1 100644 --- a/docs/2_new_data_model.md +++ b/docs/2_new_data_model.md @@ -71,15 +71,15 @@ Specific annotation format is described in the official object development [kit] Assume we use the Waymo dataset. After downloading the data, we need to implement a function to convert both the input data and annotation format into the KITTI style. Then we can implement WaymoDataset inherited from KittiDataset to load the data and perform training and evaluation. -Specifically, we implement a waymo [converter](https://github.com/open-mmlab/mmdetection3d/blob/master/tools/data_converter/waymo_converter.py) to convert Waymo data into KITTI format and a waymo dataset [class](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/waymo_dataset.py) to process it. Because we preprocess the raw data and reorganize it like KITTI, the dataset class could be implemented more easily by inheriting from KittiDataset. The last thing needed to be noted is the evaluation protocol you would like to use. Because Waymo has its own evaluation approach, we further incorporate it into our dataset class. Afterwards, users can successfully convert the data format and use `WaymoDataset` to train and evaluate the model. +Specifically, we implement a waymo [converter](../tools/data_converter/waymo_converter.py) to convert Waymo data into KITTI format and a waymo dataset [class](../mmdet3d/datasets/waymo_dataset.py) to process it. Because we preprocess the raw data and reorganize it like KITTI, the dataset class could be implemented more easily by inheriting from KittiDataset. The last thing needed to be noted is the evaluation protocol you would like to use. Because Waymo has its own evaluation approach, we further incorporate it into our dataset class. Afterwards, users can successfully convert the data format and use `WaymoDataset` to train and evaluate the model. -For more details about the intermediate results of preprocessing of Waymo dataset, please refer to its [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/tutorials/waymo.md). +For more details about the intermediate results of preprocessing of Waymo dataset, please refer to its [tutorial](./tutorials/waymo.md). ## Prepare a config The second step is to prepare configs such that the dataset could be successfully loaded. In addition, adjusting hyperparameters is usually necessary to obtain decent performance in 3D detection. -Suppose we would like to train PointPillars on Waymo to achieve 3D detection for 3 classes, vehilce, cyclist and pedestrian, we need to prepare dataset config like [this](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/datasets/waymo_dataset.py), model config like [this](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/models/hv_pointpillars_secfpn_waymo.py) and combine them like [this](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-3class.py), compared to KITTI [dataset config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/kitti-3d-3class.py), [model config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/models/hv_pointpillars_secfpn_kitti.py) and [overall](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py). +Suppose we would like to train PointPillars on Waymo to achieve 3D detection for 3 classes, vehilce, cyclist and pedestrian, we need to prepare dataset config like [this](../mmdet3d/datasets/waymo_dataset.py), model config like [this](../configs/_base_/models/hv_pointpillars_secfpn_waymo.py) and combine them like [this](../configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-3class.py), compared to KITTI [dataset config](../configs/_base_/datasets/kitti-3d-3class.py), [model config](../configs/_base_/models/hv_pointpillars_secfpn_kitti.py) and [overall](../configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py). ## Train a new model @@ -99,6 +99,6 @@ To test the trained model, you can simply run python tools/test.py configs/pointpillars/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-3class.py work_dirs/hv_pointpillars_secfpn_sbn_2x16_2x_waymoD5-3d-3class/latest.pth --eval waymo ``` -**Note**: To use Waymo evaluation protocol, you need to follow the [tutorial](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/tutorials/waymo.md) and prepare files related to metrics computation as official instructions. +**Note**: To use Waymo evaluation protocol, you need to follow the [tutorial](tutorials/waymo.md) and prepare files related to metrics computation as official instructions. For more detailed usages for test and inference, please refer to the [Case 1](1_exist_data_model.md). From 2bbb3dbeabb561dfdb8e52ca8851a133dbfda6fe Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 20:55:32 +0800 Subject: [PATCH 21/43] Tutorial 2 revised --- docs/tutorials/customize_dataset.md | 344 +++++++++++++++++++++++++++ docs/tutorials/new_dataset.md | 356 ---------------------------- 2 files changed, 344 insertions(+), 356 deletions(-) create mode 100644 docs/tutorials/customize_dataset.md delete mode 100644 docs/tutorials/new_dataset.md diff --git a/docs/tutorials/customize_dataset.md b/docs/tutorials/customize_dataset.md new file mode 100644 index 0000000000..a8074b9da5 --- /dev/null +++ b/docs/tutorials/customize_dataset.md @@ -0,0 +1,344 @@ +# Tutorial 2: Customize Datasets + +## Support new data format + +To support a new data format, you can either convert them to existing formats or directly convert them to the middle format. You could also choose to convert them offline (before training by a script) or online (implement a new dataset and do the conversion at training). In MMDetection3D, for the data that is inconvenient to read directly online, we recommend to convert it into KITTI format and do the conversion offline, thus you only need to modify the config's data annotation paths and classes after the conversion. +For data sharing similar format with existed datasets, like Lyft compared to nuScenes, we recommend to directly implement data converter and dataset class. During the procedure, inheritation could be taken into consideration to reduce the implementation workload. + +### Reorganize new data formats to existing format + +For data that is inconvenient to read directly online, the simplest way is to convert your dataset to existing dataset formats. + +Typically we need a data converter to reorganize the raw data and convert the annotation format into KITTI style. Then a new dataset class inherited from existed ones is sometimes necessary for dealing with some specific differences between datasets. Finally, the users need to further modify the config files to use the dataset. An [example](../2_new_data_model.md) training predefined models on Waymo dataset by converting it into KITTI style can be taken for reference. + +### Reorganize new data format to middle format + +It is also fine if you do not want to convert the annotation format to existed formats. +Actually, we convert all the supported datasets into pickle files, which summarize useful information for model training and inference. + +The annotation of a dataset is a list of dict, each dict corresponds to a frame. +A basic example (used in KITTI) is as follows. A frame consists of several keys, like `image`, `point_cloud`, `calib` and `annos`. +As long as we could directly read data according to these information, the organization of raw data could also be different from existed ones. +With this design, we provide an alternative choice for customizing datasets. + +```python + +[ + {'image': {'image_idx': 0, 'image_path': 'training/image_2/000000.png', 'image_shape': array([ 370, 1224], dtype=int32)}, + 'point_cloud': {'num_features': 4, 'velodyne_path': 'training/velodyne/000000.bin'}, + 'calib': {'P0': array([[707.0493, 0. , 604.0814, 0. ], + [ 0. , 707.0493, 180.5066, 0. ], + [ 0. , 0. , 1. , 0. ], + [ 0. , 0. , 0. , 1. ]]), + 'P1': array([[ 707.0493, 0. , 604.0814, -379.7842], + [ 0. , 707.0493, 180.5066, 0. ], + [ 0. , 0. , 1. , 0. ], + [ 0. , 0. , 0. , 1. ]]), + 'P2': array([[ 7.070493e+02, 0.000000e+00, 6.040814e+02, 4.575831e+01], + [ 0.000000e+00, 7.070493e+02, 1.805066e+02, -3.454157e-01], + [ 0.000000e+00, 0.000000e+00, 1.000000e+00, 4.981016e-03], + [ 0.000000e+00, 0.000000e+00, 0.000000e+00, 1.000000e+00]]), + 'P3': array([[ 7.070493e+02, 0.000000e+00, 6.040814e+02, -3.341081e+02], + [ 0.000000e+00, 7.070493e+02, 1.805066e+02, 2.330660e+00], + [ 0.000000e+00, 0.000000e+00, 1.000000e+00, 3.201153e-03], + [ 0.000000e+00, 0.000000e+00, 0.000000e+00, 1.000000e+00]]), + 'R0_rect': array([[ 0.9999128 , 0.01009263, -0.00851193, 0. ], + [-0.01012729, 0.9999406 , -0.00403767, 0. ], + [ 0.00847068, 0.00412352, 0.9999556 , 0. ], + [ 0. , 0. , 0. , 1. ]]), + 'Tr_velo_to_cam': array([[ 0.00692796, -0.9999722 , -0.00275783, -0.02457729], + [-0.00116298, 0.00274984, -0.9999955 , -0.06127237], + [ 0.9999753 , 0.00693114, -0.0011439 , -0.3321029 ], + [ 0. , 0. , 0. , 1. ]]), + 'Tr_imu_to_velo': array([[ 9.999976e-01, 7.553071e-04, -2.035826e-03, -8.086759e-01], + [-7.854027e-04, 9.998898e-01, -1.482298e-02, 3.195559e-01], + [ 2.024406e-03, 1.482454e-02, 9.998881e-01, -7.997231e-01], + [ 0.000000e+00, 0.000000e+00, 0.000000e+00, 1.000000e+00]])}, + 'annos': {'name': array(['Pedestrian'], dtype=' (n, 4), - 'labels': (n, ), - 'bboxes_ignore': (k, 4), - 'labels_ignore': (k, ) (optional field) - } - }, - ... -] -``` - -There are two ways to work with custom datasets. - -- online conversion - - You can write a new Dataset class inherited from `CustomDataset`, and overwrite two methods - `load_annotations(self, ann_file)` and `get_ann_info(self, idx)`, - like [CocoDataset](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/coco.py) and [VOCDataset](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/voc.py). - -- offline conversion - - You can convert the annotation format to the expected format above and save it to - a pickle or json file, like [pascal_voc.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/convert_datasets/pascal_voc.py). - Then you can simply use `CustomDataset`. - -### An example of customized dataset - -Assume the annotation is in a new format in text files. -The bounding boxes annotations are stored in text file `annotation.txt` as the following - -``` -# -000001.jpg -1280 720 -2 -10 20 40 60 1 -20 40 50 60 2 -# -000002.jpg -1280 720 -3 -50 20 40 60 2 -20 40 30 45 2 -30 40 50 60 3 -``` - -We can create a new dataset in `mmdet/datasets/my_dataset.py` to load the data. - -```python -import mmcv -import numpy as np - -from .builder import DATASETS -from .custom import CustomDataset - - -@DATASETS.register_module() -class MyDataset(CustomDataset): - - CLASSES = ('person', 'bicycle', 'car', 'motorcycle') - - def load_annotations(self, ann_file): - ann_list = mmcv.list_from_file(ann_file) - - data_infos = [] - for i, ann_line in enumerate(ann_list): - if ann_line != '#': - continue - - img_shape = ann_list[i + 2].split(' ') - width = int(img_shape[0]) - height = int(img_shape[1]) - bbox_number = int(ann_list[i + 3]) - - anns = ann_line.split(' ') - bboxes = [] - labels = [] - for anns in ann_list[i + 4:i + 4 + bbox_number]: - bboxes.append([float(ann) for ann in anns[:4]]) - labels.append(int(anns[4])) - - data_infos.append( - dict( - filename=ann_list[i + 1], - width=width, - height=height, - ann=dict( - bboxes=np.array(bboxes).astype(np.float32), - labels=np.array(labels).astype(np.int64)) - )) - - return data_infos - - def get_ann_info(self, idx): - return self.data_infos[idx]['ann'] - -``` - -Then in the config, to use `MyDataset` you can modify the config as the following - -```python -dataset_A_train = dict( - type='MyDataset', - ann_file = 'image_list.txt', - pipeline=train_pipeline -) -``` - -## Customize datasets by mixing dataset - -MMDetection also supports to mix dataset for training. -Currently it supports to concat and repeat datasets. - -### Repeat dataset - -We use `RepeatDataset` as wrapper to repeat the dataset. For example, suppose the original dataset is `Dataset_A`, to repeat it, the config looks like the following -```python -dataset_A_train = dict( - type='RepeatDataset', - times=N, - dataset=dict( # This is the original config of Dataset_A - type='Dataset_A', - ... - pipeline=train_pipeline - ) - ) -``` - -### Class balanced dataset - -We use `ClassBalancedDataset` as wrapper to repeat the dataset based on category -frequency. The dataset to repeat needs to instantiate function `self.get_cat_ids(idx)` -to support `ClassBalancedDataset`. -For example, to repeat `Dataset_A` with `oversample_thr=1e-3`, the config looks like the following -```python -dataset_A_train = dict( - type='ClassBalancedDataset', - oversample_thr=1e-3, - dataset=dict( # This is the original config of Dataset_A - type='Dataset_A', - ... - pipeline=train_pipeline - ) - ) -``` -You may refer to [source code](../../mmdet/datasets/dataset_wrappers.py) for details. - -### Concatenate dataset - -There two ways to concatenate the dataset. - -1. If the datasets you want to concatenate are in the same type with different annotation files, you can concatenate the dataset configs like the following. - - ```python - dataset_A_train = dict( - type='Dataset_A', - ann_file = ['anno_file_1', 'anno_file_2'], - pipeline=train_pipeline - ) - ``` - -2. In case the dataset you want to concatenate is different, you can concatenate the dataset configs like the following. - - ```python - dataset_A_train = dict() - dataset_B_train = dict() - - data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train = [ - dataset_A_train, - dataset_B_train - ], - val = dataset_A_val, - test = dataset_A_test - ) - ``` - - -A more complex example that repeats `Dataset_A` and `Dataset_B` by N and M times, respectively, and then concatenates the repeated datasets is as the following. - -```python -dataset_A_train = dict( - type='RepeatDataset', - times=N, - dataset=dict( - type='Dataset_A', - ... - pipeline=train_pipeline - ) -) -dataset_A_val = dict( - ... - pipeline=test_pipeline -) -dataset_A_test = dict( - ... - pipeline=test_pipeline -) -dataset_B_train = dict( - type='RepeatDataset', - times=M, - dataset=dict( - type='Dataset_B', - ... - pipeline=train_pipeline - ) -) -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train = [ - dataset_A_train, - dataset_B_train - ], - val = dataset_A_val, - test = dataset_A_test -) - -``` - -### Modify classes of existing dataset - -With existing dataset types, we can modify the class names of them to train subset of the dataset. -For example, if you want to train only three classes of the current dataset, -you can modify the classes of dataset. -The dataset will subtract subset of the data which contains at least one class in the `classes`. - -```python -classes = ('person', 'bicycle', 'car') -data = dict( - train=dict(classes=classes), - val=dict(classes=classes), - test=dict(classes=classes)) -``` - -MMDetection V2.0 also supports to read the classes from a file, which is common in real applications. -For example, assume the `classes.txt` contains the name of classes as the following. - -``` -person -bicycle -car -``` - -Users can set the classes as a file path, the dataset will load it and convert it to a list automatically. -```python -classes = 'path/to/classes.txt' -data = dict( - train=dict(classes=classes), - val=dict(classes=classes), - test=dict(classes=classes)) -``` From 2934c3095008d78e82d8aed9482289844781827d Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 20:56:08 +0800 Subject: [PATCH 22/43] Delete finetune.md --- docs/tutorials/finetune.md | 84 -------------------------------------- 1 file changed, 84 deletions(-) delete mode 100644 docs/tutorials/finetune.md diff --git a/docs/tutorials/finetune.md b/docs/tutorials/finetune.md deleted file mode 100644 index db051e45ff..0000000000 --- a/docs/tutorials/finetune.md +++ /dev/null @@ -1,84 +0,0 @@ -# Tutorial 1: Finetuning Models - -Detectors pre-trained on the COCO dataset can serve as a good pre-trained model for other datasets, e.g., CityScapes and KITTI Dataset. -This tutorial provides instruction for users to use the models provided in the [Model Zoo](../model_zoo.md) for other datasets to obtain better performance. - -There are two steps to finetune a model on a new dataset. -- Add support for the new dataset following [Tutorial 2: Adding New Dataset](new_dataset.md). -- Modify the configs as will be discussed in this tutorial. - - -Take the finetuning process on Cityscapes Dataset as an example, the users need to modify five parts in the config. - -## Inherit base configs -To release the burden and reduce bugs in writing the whole configs, MMDetection V2.0 support inheriting configs from multiple existing configs. To finetune a Mask RCNN model, the new config needs to inherit -`_base_/models/mask_rcnn_r50_fpn.py` to build the basic structure of the model. To use the Cityscapes Dataset, the new config can also simply inherit `_base_/datasets/cityscapes_instance.py`. For runtime settings such as training schedules, the new config needs to inherit `_base_/default_runtime.py`. This configs are in the `configs` directory and the users can also choose to write the whole contents rather than use inheritance. - -```python -_base_ = [ - '../_base_/models/mask_rcnn_r50_fpn.py', - '../_base_/datasets/cityscapes_instance.py', '../_base_/default_runtime.py' -] -``` - -## Modify head -Then the new config needs to modify the head according to the class numbers of the new datasets. By only changing `num_classes` in the roi_head, the weights of the pre-trained models are mostly reused except the final prediction head. - -```python -model = dict( - pretrained=None, - roi_head=dict( - bbox_head=dict( - type='Shared2FCBBoxHead', - in_channels=256, - fc_out_channels=1024, - roi_feat_size=7, - num_classes=8, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), - reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)), - mask_head=dict( - type='FCNMaskHead', - num_convs=4, - in_channels=256, - conv_out_channels=256, - num_classes=8, - loss_mask=dict( - type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)))) -``` - -## Modify dataset -The users may also need to prepare the dataset and write the configs about dataset. MMDetection V2.0 already support VOC, WIDER FACE, COCO and Cityscapes Dataset. - -## Modify training schedule -The finetuning hyperparameters vary from the default schedule. It usually requires smaller learning rate and less training epochs - -```python -# optimizer -# lr is set for a batch size of 8 -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=None) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=0.001, - # [7] yields higher performance than [6] - step=[7]) -total_epochs = 8 # actual epoch = 8 * 8 = 64 -log_config = dict(interval=100) -``` - -## Use pre-trained model -To use the pre-trained model, the new config add the link of pre-trained models in the `load_from`. The users might need to download the model weights before training to avoid the download time during training. - -```python -load_from = 'https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/mask_rcnn_r50_fpn_2x_20181010-41d35c05.pth' # noqa - -``` From b3ac1fee78f7ba666d1f4c2d1a877225df845410 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 20:57:53 +0800 Subject: [PATCH 23/43] Create customize_models.md --- docs/tutorials/customize_models.md | 369 +++++++++++++++++++++++++++++ 1 file changed, 369 insertions(+) create mode 100644 docs/tutorials/customize_models.md diff --git a/docs/tutorials/customize_models.md b/docs/tutorials/customize_models.md new file mode 100644 index 0000000000..8f76980bd8 --- /dev/null +++ b/docs/tutorials/customize_models.md @@ -0,0 +1,369 @@ +# Tutorial 4: Customize Models + +We basically categorize model components into 5 types. + +- backbone: usually an FCN network to extract feature maps, e.g., ResNet, MobileNet. +- neck: the component between backbones and heads, e.g., FPN, PAFPN. +- head: the component for specific tasks, e.g., bbox prediction and mask prediction. +- roi extractor: the part for extracting RoI features from feature maps, e.g., RoI Align. +- loss: the component in head for calculating losses, e.g., FocalLoss, L1Loss, and GHMLoss. + +## Develop new components + +### Add a new backbone + +Here we show how to develop new components with an example of MobileNet. + +#### 1. Define a new backbone (e.g. MobileNet) + +Create a new file `mmdet/models/backbones/mobilenet.py`. + +```python +import torch.nn as nn + +from ..builder import BACKBONES + + +@BACKBONES.register_module() +class MobileNet(nn.Module): + + def __init__(self, arg1, arg2): + pass + + def forward(self, x): # should return a tuple + pass + + def init_weights(self, pretrained=None): + pass +``` + +#### 2. Import the module + +You can either add the following line to `mmdet/models/backbones/__init__.py` + +```python +from .mobilenet import MobileNet +``` + +or alternatively add + +```python +custom_imports = dict( + imports=['mmdet.models.backbones.mobilenet'], + allow_failed_imports=False) +``` + +to the config file to avoid modifying the original code. + +#### 3. Use the backbone in your config file + +```python +model = dict( + ... + backbone=dict( + type='MobileNet', + arg1=xxx, + arg2=xxx), + ... +``` + +### Add new necks + +#### 1. Define a neck (e.g. PAFPN) + +Create a new file `mmdet/models/necks/pafpn.py`. + +```python +from ..builder import NECKS + +@NECKS.register +class PAFPN(nn.Module): + + def __init__(self, + in_channels, + out_channels, + num_outs, + start_level=0, + end_level=-1, + add_extra_convs=False): + pass + + def forward(self, inputs): + # implementation is ignored + pass +``` + +#### 2. Import the module + +You can either add the following line to `mmdet/models/necks/__init__.py`, + +```python +from .pafpn import PAFPN +``` + +or alternatively add + +```python +custom_imports = dict( + imports=['mmdet.models.necks.mobilenet'], + allow_failed_imports=False) +``` + +to the config file and avoid modifying the original code. + +#### 3. Modify the config file + +```python +neck=dict( + type='PAFPN', + in_channels=[256, 512, 1024, 2048], + out_channels=256, + num_outs=5) +``` + +### Add new heads + +Here we show how to develop a new head with the example of [Double Head R-CNN](https://arxiv.org/abs/1904.06493) as the following. + +First, add a new bbox head in `mmdet/models/roi_heads/bbox_heads/double_bbox_head.py`. +Double Head R-CNN implements a new bbox head for object detection. +To implement a bbox head, basically we need to implement three functions of the new module as the following. + +```python +from mmdet.models.builder import HEADS +from .bbox_head import BBoxHead + +@HEADS.register_module() +class DoubleConvFCBBoxHead(BBoxHead): + r"""Bbox head used in Double-Head R-CNN + + /-> cls + /-> shared convs -> + \-> reg + roi features + /-> cls + \-> shared fc -> + \-> reg + """ # noqa: W605 + + def __init__(self, + num_convs=0, + num_fcs=0, + conv_out_channels=1024, + fc_out_channels=1024, + conv_cfg=None, + norm_cfg=dict(type='BN'), + **kwargs): + kwargs.setdefault('with_avg_pool', True) + super(DoubleConvFCBBoxHead, self).__init__(**kwargs) + + def init_weights(self): + # conv layers are already initialized by ConvModule + + def forward(self, x_cls, x_reg): + +``` + +Second, implement a new RoI Head if it is necessary. We plan to inherit the new `DoubleHeadRoIHead` from `StandardRoIHead`. We can find that a `StandardRoIHead` already implements the following functions. + +```python +import torch + +from mmdet.core import bbox2result, bbox2roi, build_assigner, build_sampler +from ..builder import HEADS, build_head, build_roi_extractor +from .base_roi_head import BaseRoIHead +from .test_mixins import BBoxTestMixin, MaskTestMixin + + +@HEADS.register_module() +class StandardRoIHead(BaseRoIHead, BBoxTestMixin, MaskTestMixin): + """Simplest base roi head including one bbox head and one mask head. + """ + + def init_assigner_sampler(self): + + def init_bbox_head(self, bbox_roi_extractor, bbox_head): + + def init_mask_head(self, mask_roi_extractor, mask_head): + + def init_weights(self, pretrained): + + def forward_dummy(self, x, proposals): + + + def forward_train(self, + x, + img_metas, + proposal_list, + gt_bboxes, + gt_labels, + gt_bboxes_ignore=None, + gt_masks=None): + + def _bbox_forward(self, x, rois): + + def _bbox_forward_train(self, x, sampling_results, gt_bboxes, gt_labels, + img_metas): + + def _mask_forward_train(self, x, sampling_results, bbox_feats, gt_masks, + img_metas): + + def _mask_forward(self, x, rois=None, pos_inds=None, bbox_feats=None): + + + def simple_test(self, + x, + proposal_list, + img_metas, + proposals=None, + rescale=False): + """Test without augmentation.""" + +``` + +Double Head's modification is mainly in the bbox_forward logic, and it inherits other logics from the `StandardRoIHead`. +In the `mmdet/models/roi_heads/double_roi_head.py`, we implement the new RoI Head as the following: + +```python +from ..builder import HEADS +from .standard_roi_head import StandardRoIHead + + +@HEADS.register_module() +class DoubleHeadRoIHead(StandardRoIHead): + """RoI head for Double Head RCNN + + https://arxiv.org/abs/1904.06493 + """ + + def __init__(self, reg_roi_scale_factor, **kwargs): + super(DoubleHeadRoIHead, self).__init__(**kwargs) + self.reg_roi_scale_factor = reg_roi_scale_factor + + def _bbox_forward(self, x, rois): + bbox_cls_feats = self.bbox_roi_extractor( + x[:self.bbox_roi_extractor.num_inputs], rois) + bbox_reg_feats = self.bbox_roi_extractor( + x[:self.bbox_roi_extractor.num_inputs], + rois, + roi_scale_factor=self.reg_roi_scale_factor) + if self.with_shared_head: + bbox_cls_feats = self.shared_head(bbox_cls_feats) + bbox_reg_feats = self.shared_head(bbox_reg_feats) + cls_score, bbox_pred = self.bbox_head(bbox_cls_feats, bbox_reg_feats) + + bbox_results = dict( + cls_score=cls_score, + bbox_pred=bbox_pred, + bbox_feats=bbox_cls_feats) + return bbox_results +``` + +Last, the users need to add the module in +`mmdet/models/bbox_heads/__init__.py` and `mmdet/models/roi_heads/__init__.py` thus the corresponding registry could find and load them. + +Alternatively, the users can add + +```python +custom_imports=dict( + imports=['mmdet.models.roi_heads.double_roi_head', 'mmdet.models.bbox_heads.double_bbox_head']) +``` + +to the config file and achieve the same goal. + +The config file of Double Head R-CNN is as the following + +```python +_base_ = '../faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py' +model = dict( + roi_head=dict( + type='DoubleHeadRoIHead', + reg_roi_scale_factor=1.3, + bbox_head=dict( + _delete_=True, + type='DoubleConvFCBBoxHead', + num_convs=4, + num_fcs=2, + in_channels=256, + conv_out_channels=1024, + fc_out_channels=1024, + roi_feat_size=7, + num_classes=80, + bbox_coder=dict( + type='DeltaXYWHBBoxCoder', + target_means=[0., 0., 0., 0.], + target_stds=[0.1, 0.1, 0.2, 0.2]), + reg_class_agnostic=False, + loss_cls=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=2.0), + loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=2.0)))) + +``` + +Since MMDetection 2.0, the config system supports to inherit configs such that the users can focus on the modification. +The Double Head R-CNN mainly uses a new DoubleHeadRoIHead and a new +`DoubleConvFCBBoxHead`, the arguments are set according to the `__init__` function of each module. + +### Add new loss + +Assume you want to add a new loss as `MyLoss`, for bounding box regression. +To add a new loss function, the users need implement it in `mmdet/models/losses/my_loss.py`. +The decorator `weighted_loss` enable the loss to be weighted for each element. + +```python +import torch +import torch.nn as nn + +from ..builder import LOSSES +from .utils import weighted_loss + +@weighted_loss +def my_loss(pred, target): + assert pred.size() == target.size() and target.numel() > 0 + loss = torch.abs(pred - target) + return loss + +@LOSSES.register_module() +class MyLoss(nn.Module): + + def __init__(self, reduction='mean', loss_weight=1.0): + super(MyLoss, self).__init__() + self.reduction = reduction + self.loss_weight = loss_weight + + def forward(self, + pred, + target, + weight=None, + avg_factor=None, + reduction_override=None): + assert reduction_override in (None, 'none', 'mean', 'sum') + reduction = ( + reduction_override if reduction_override else self.reduction) + loss_bbox = self.loss_weight * my_loss( + pred, target, weight, reduction=reduction, avg_factor=avg_factor) + return loss_bbox +``` + +Then the users need to add it in the `mmdet/models/losses/__init__.py`. + +```python +from .my_loss import MyLoss, my_loss + +``` + +Alternatively, you can add + +```python +custom_imports=dict( + imports=['mmdet.models.losses.my_loss']) +``` + +to the config file and achieve the same goal. + +To use it, modify the `loss_xxx` field. +Since MyLoss is for regression, you need to modify the `loss_bbox` field in the head. + +```python +loss_bbox=dict(type='MyLoss', loss_weight=1.0)) +``` From c8303cf61be78d6547f486586aa5c0e83b7daf9b Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 20:58:39 +0800 Subject: [PATCH 24/43] Create customize_runtime.md --- docs/tutorials/customize_runtime.md | 319 ++++++++++++++++++++++++++++ 1 file changed, 319 insertions(+) create mode 100644 docs/tutorials/customize_runtime.md diff --git a/docs/tutorials/customize_runtime.md b/docs/tutorials/customize_runtime.md new file mode 100644 index 0000000000..3f52b2732e --- /dev/null +++ b/docs/tutorials/customize_runtime.md @@ -0,0 +1,319 @@ +# Tutorial 5: Customize Runtime Settings + +## Customize optimization settings + +### Customize optimizer supported by Pytorch + +We already support to use all the optimizers implemented by PyTorch, and the only modification is to change the `optimizer` field of config files. +For example, if you want to use `ADAM` (note that the performance could drop a lot), the modification could be as the following. + +```python +optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001) +``` + +To modify the learning rate of the model, the users only need to modify the `lr` in the config of optimizer. The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch. + +### Customize self-implemented optimizer + +#### 1. Define a new optimizer + +A customized optimizer could be defined as following. + +Assume you want to add a optimizer named `MyOptimizer`, which has arguments `a`, `b`, and `c`. +You need to create a new directory named `mmdet/core/optimizer`. +And then implement the new optimizer in a file, e.g., in `mmdet/core/optimizer/my_optimizer.py`: + +```python +from .registry import OPTIMIZERS +from torch.optim import Optimizer + + +@OPTIMIZERS.register_module() +class MyOptimizer(Optimizer): + + def __init__(self, a, b, c) + +``` + +#### 2. Add the optimizer to registry + +To find the above module defined above, this module should be imported into the main namespace at first. There are two options to achieve it. + +- Modify `mmdet/core/optimizer/__init__.py` to import it. + + The newly defined module should be imported in `mmdet/core/optimizer/__init__.py` so that the registry will + find the new module and add it: + +```python +from .my_optimizer import MyOptimizer +``` + +- Use `custom_imports` in the config to manually import it + +```python +custom_imports = dict(imports=['mmdet.core.optimizer.my_optimizer'], allow_failed_imports=False) +``` + +The module `mmdet.core.optimizer.my_optimizer` will be imported at the beginning of the program and the class `MyOptimizer` is then automatically registered. +Note that only the package containing the class `MyOptimizer` should be imported. +`mmdet.core.optimizer.my_optimizer.MyOptimizer` **cannot** be imported directly. + +Actually users can use a totally different file directory structure using this importing method, as long as the module root can be located in `PYTHONPATH`. + +#### 3. Specify the optimizer in the config file + +Then you can use `MyOptimizer` in `optimizer` field of config files. +In the configs, the optimizers are defined by the field `optimizer` like the following: + +```python +optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) +``` + +To use your own optimizer, the field can be changed to + +```python +optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value) +``` + +### Customize optimizer constructor + +Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNorm layers. +The users can do those fine-grained parameter tuning through customizing optimizer constructor. + +```python +from mmcv.utils import build_from_cfg + +from mmcv.runner.optimizer import OPTIMIZER_BUILDERS, OPTIMIZERS +from mmdet.utils import get_root_logger +from .my_optimizer import MyOptimizer + + +@OPTIMIZER_BUILDERS.register_module() +class MyOptimizerConstructor(object): + + def __init__(self, optimizer_cfg, paramwise_cfg=None): + + def __call__(self, model): + + return my_optimizer + +``` + +The default optimizer constructor is implemented [here](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/optimizer/default_constructor.py#L11), which could also serve as a template for new optimizer constructor. + +### Additional settings + +Tricks not implemented by the optimizer should be implemented through optimizer constructor (e.g., set parameter-wise learning rates) or hooks. We list some common settings that could stabilize the training or accelerate the training. Feel free to create PR, issue for more settings. + +- __Use gradient clip to stabilize training__: + Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below: + + ```python + optimizer_config = dict( + _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) + ``` + + If your config inherits the base config which already sets the `optimizer_config`, you might need `_delete_=True` to overide the unnecessary settings. See the [config documenetation](https://mmdetection.readthedocs.io/en/latest/config.html) for more details. + +- __Use momentum schedule to accelerate model convergence__: + We support momentum scheduler to modify model's momentum according to learning rate, which could make the model converge in a faster way. + Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence. + For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327) and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130). + + ```python + lr_config = dict( + policy='cyclic', + target_ratio=(10, 1e-4), + cyclic_times=1, + step_ratio_up=0.4, + ) + momentum_config = dict( + policy='cyclic', + target_ratio=(0.85 / 0.95, 1), + cyclic_times=1, + step_ratio_up=0.4, + ) + ``` + +## Customize training schedules + +By default we use step learning rate with 1x schedule, this calls [`StepLRHook`](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L153) in MMCV. +We support many other learning rate schedule [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py), such as `CosineAnnealing` and `Poly` schedule. Here are some examples + +- Poly schedule: + + ```python + lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) + ``` + +- ConsineAnnealing schedule: + + ```python + lr_config = dict( + policy='CosineAnnealing', + warmup='linear', + warmup_iters=1000, + warmup_ratio=1.0 / 10, + min_lr_ratio=1e-5) + ``` + +## Customize workflow + +Workflow is a list of (phase, epochs) to specify the running order and epochs. +By default it is set to be + +```python +workflow = [('train', 1)] +``` + +which means running 1 epoch for training. +Sometimes user may want to check some metrics (e.g. loss, accuracy) about the model on the validate set. +In such case, we can set the workflow as + +```python +[('train', 1), ('val', 1)] +``` + +so that 1 epoch for training and 1 epoch for validation will be run iteratively. + +**Note**: + +1. The parameters of model will not be updated during val epoch. +2. Keyword `total_epochs` in the config only controls the number of training epochs and will not affect the validation workflow. +3. Workflows `[('train', 1), ('val', 1)]` and `[('train', 1)]` will not change the behavior of `EvalHook` because `EvalHook` is called by `after_train_epoch` and validation workflow only affect hooks that are called through `after_val_epoch`. Therefore, the only difference between `[('train', 1), ('val', 1)]` and `[('train', 1)]` is that the runner will calculate losses on validation set after each training epoch. + +## Customize hooks + +### Customize self-implemented hooks + +#### 1. Implement a new hook + +There are some occasions when the users might need to implement a new hook. MMDetection supports customized hooks in training (#3395) since v2.3.0. Thus the users could implement a hook directly in mmdet or their mmdet-based codebases and use the hook by only modifying the config in training. +Before v2.3.0, the users need to modify the code to get the hook registered before training starts. +Here we give an example of creating a new hook in mmdet and using it in training. + +```python +from mmcv.runner import HOOKS, Hook + + +@HOOKS.register_module() +class MyHook(Hook): + + def __init__(self, a, b): + pass + + def before_run(self, runner): + pass + + def after_run(self, runner): + pass + + def before_epoch(self, runner): + pass + + def after_epoch(self, runner): + pass + + def before_iter(self, runner): + pass + + def after_iter(self, runner): + pass +``` + +Depending on the functionality of the hook, the users need to specify what the hook will do at each stage of the training in `before_run`, `after_run`, `before_epoch`, `after_epoch`, `before_iter`, and `after_iter`. + +#### 2. Register the new hook + +Then we need to make `MyHook` imported. Assuming the file is in `mmdet/core/utils/my_hook.py` there are two ways to do that: + +- Modify `mmdet/core/utils/__init__.py` to import it. + + The newly defined module should be imported in `mmdet/core/utils/__init__.py` so that the registry will + find the new module and add it: + +```python +from .my_hook import MyHook +``` + +- Use `custom_imports` in the config to manually import it + +```python +custom_imports = dict(imports=['mmdet.core.utils.my_hook'], allow_failed_imports=False) +``` + +#### 3. Modify the config + +```python +custom_hooks = [ + dict(type='MyHook', a=a_value, b=b_value) +] +``` + +You can also set the priority of the hook by adding key `priority` to `'NORMAL'` or `'HIGHEST'` as below + +```python +custom_hooks = [ + dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL') +] +``` + +By default the hook's priority is set as `NORMAL` during registration. + +### Use hooks implemented in MMCV + +If the hook is already implemented in MMCV, you can directly modify the config to use the hook as below + +```python +custom_hooks = [ + dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL') +] +``` + +### Modify default runtime hooks + +There are some common hooks that are not registerd through `custom_hooks`, they are + +- log_config +- checkpoint_config +- evaluation +- lr_config +- optimizer_config +- momentum_config + +In those hooks, only the logger hook has the `VERY_LOW` priority, others' priority are `NORMAL`. +The above-mentioned tutorials already covers how to modify `optimizer_config`, `momentum_config`, and `lr_config`. +Here we reveals how what we can do with `log_config`, `checkpoint_config`, and `evaluation`. + +#### Checkpoint config + +The MMCV runner will use `checkpoint_config` to initialize [`CheckpointHook`](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/hooks/checkpoint.py#L9). + +```python +checkpoint_config = dict(interval=1) +``` + +The users could set `max_keep_ckpts` to only save only small number of checkpoints or decide whether to store state dict of optimizer by `save_optimizer`. More details of the arguments are [here](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.CheckpointHook) + +#### Log config + +The `log_config` wraps multiple logger hooks and enables to set intervals. Now MMCV supports `WandbLoggerHook`, `MlflowLoggerHook`, and `TensorboardLoggerHook`. +The detail usages can be found in the [doc](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.LoggerHook). + +```python +log_config = dict( + interval=50, + hooks=[ + dict(type='TextLoggerHook'), + dict(type='TensorboardLoggerHook') + ]) +``` + +#### Evaluation config + +The config of `evaluation` will be used to initialize the [`EvalHook`](https://github.com/open-mmlab/mmdetection/blob/7a404a2c000620d52156774a5025070d9e00d918/mmdet/core/evaluation/eval_hooks.py#L8). +Except the key `interval`, other arguments such as `metric` will be passed to the `dataset.evaluate()` + +```python +evaluation = dict(interval=1, metric='bbox') +``` From c507e9cbe61c52d674af06be0f17f9e26064511e Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 20:59:25 +0800 Subject: [PATCH 25/43] Update index.rst --- docs/tutorials/index.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/tutorials/index.rst b/docs/tutorials/index.rst index 36c0d577ac..5aa46b3a00 100644 --- a/docs/tutorials/index.rst +++ b/docs/tutorials/index.rst @@ -1,8 +1,9 @@ .. toctree:: :maxdepth: 2 - finetune.md - new_dataset.md + config.md + customize_dataset.md data_pipeline.md - new_modules.md + customize_model.md + customize_runtime.md waymo.md From a5f48aca4d9a631668dba2ebc025bec41a2c13cb Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 20:59:41 +0800 Subject: [PATCH 26/43] Update waymo.md --- docs/tutorials/waymo.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/waymo.md b/docs/tutorials/waymo.md index 120d4d6d4a..8e7cdb25d9 100644 --- a/docs/tutorials/waymo.md +++ b/docs/tutorials/waymo.md @@ -1,4 +1,4 @@ -# Tutorial 5: Waymo Dataset +# Tutorial 6: Waymo Dataset This page provides specific tutorials about the usage of MMDetection3D for waymo dataset. From 8fa2e36f569763d86a3cba84a54b35dfc92ec267 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 21:00:47 +0800 Subject: [PATCH 27/43] Delete new_modules.md --- docs/tutorials/new_modules.md | 388 ---------------------------------- 1 file changed, 388 deletions(-) delete mode 100644 docs/tutorials/new_modules.md diff --git a/docs/tutorials/new_modules.md b/docs/tutorials/new_modules.md deleted file mode 100644 index e774028ffc..0000000000 --- a/docs/tutorials/new_modules.md +++ /dev/null @@ -1,388 +0,0 @@ -# Tutorial 4: Adding New Modules - -## Customize optimizer - -A customized optimizer could be defined as following. - -Assume you want to add a optimizer named as `MyOptimizer`, which has arguments `a`, `b`, and `c`. -You need to create a new directory named `mmdet/core/optimizer`. -And then implement the new optimizer in a file, e.g., in `mmdet/core/optimizer/my_optimizer.py`: - -```python -from .registry import OPTIMIZERS -from torch.optim import Optimizer - - -@OPTIMIZERS.register_module() -class MyOptimizer(Optimizer): - - def __init__(self, a, b, c) - -``` - -Then add this module in `mmdet/core/optimizer/__init__.py` thus the registry will -find the new module and add it: - -```python -from .my_optimizer import MyOptimizer -``` - -Then you can use `MyOptimizer` in `optimizer` field of config files. -In the configs, the optimizers are defined by the field `optimizer` like the following: -```python -optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) -``` -To use your own optimizer, the field can be changed as -```python -optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value) -``` - -We already support to use all the optimizers implemented by PyTorch, and the only modification is to change the `optimizer` field of config files. -For example, if you want to use `ADAM`, though the performance will drop a lot, the modification could be as the following. -```python -optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001) -``` -The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch. - -## Customize optimizer constructor - -Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNoarm layers. -The users can do those fine-grained parameter tuning through customizing optimizer constructor. - -```python -from mmcv.utils import build_from_cfg - -from mmcv.runner.optimizer import OPTIMIZER_BUILDERS, OPTIMIZERS -from mmdet.utils import get_root_logger -from .my_optimizer import MyOptimizer - - -@OPTIMIZER_BUILDERS.register_module() -class MyOptimizerConstructor(object): - - def __init__(self, optimizer_cfg, paramwise_cfg=None): - - def __call__(self, model): - - return my_optimizer - -``` - - -## Develop new components - -We basically categorize model components into 4 types. - -- backbone: usually an FCN network to extract feature maps, e.g., ResNet, MobileNet. -- neck: the component between backbones and heads, e.g., FPN, PAFPN. -- head: the component for specific tasks, e.g., bbox prediction and mask prediction. -- roi extractor: the part for extracting RoI features from feature maps, e.g., RoI Align. - -### Add new backbones - -Here we show how to develop new components with an example of MobileNet. - -1. Create a new file `mmdet/models/backbones/mobilenet.py`. - -```python -import torch.nn as nn - -from ..registry import BACKBONES - - -@BACKBONES.register_module() -class MobileNet(nn.Module): - - def __init__(self, arg1, arg2): - pass - - def forward(self, x): # should return a tuple - pass - - def init_weights(self, pretrained=None): - pass -``` - -2. Import the module in `mmdet/models/backbones/__init__.py`. - -```python -from .mobilenet import MobileNet -``` - -3. Use it in your config file. - -```python -model = dict( - ... - backbone=dict( - type='MobileNet', - arg1=xxx, - arg2=xxx), - ... -``` - -### Add new necks - -Here we take PAFPN as an example. - -1. Create a new file in `mmdet/models/necks/pafpn.py`. - - ```python - from ..registry import NECKS - - @NECKS.register - class PAFPN(nn.Module): - - def __init__(self, - in_channels, - out_channels, - num_outs, - start_level=0, - end_level=-1, - add_extra_convs=False): - pass - - def forward(self, inputs): - # implementation is ignored - pass - ``` - -2. Import the module in `mmdet/models/necks/__init__.py`. - - ```python - from .pafpn import PAFPN - ``` - -3. Modify the config file. - - ```python - neck=dict( - type='PAFPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5) - ``` - -### Add new heads - -Here we show how to develop a new head with the example of [Double Head R-CNN](https://arxiv.org/abs/1904.06493) as the following. - -First, add a new bbox head in `mmdet/models/bbox_heads/double_bbox_head.py`. -Double Head R-CNN implements a new bbox head for object detection. -To implement a bbox head, basically we need to implement three functions of the new module as the following. - -```python -@HEADS.register_module() -class DoubleConvFCBBoxHead(BBoxHead): - r"""Bbox head used in Double-Head R-CNN - - /-> cls - /-> shared convs -> - \-> reg - roi features - /-> cls - \-> shared fc -> - \-> reg - """ # noqa: W605 - - def __init__(self, - num_convs=0, - num_fcs=0, - conv_out_channels=1024, - fc_out_channels=1024, - conv_cfg=None, - norm_cfg=dict(type='BN'), - **kwargs): - kwargs.setdefault('with_avg_pool', True) - super(DoubleConvFCBBoxHead, self).__init__(**kwargs) - - def init_weights(self): - # conv layers are already initialized by ConvModule - - def forward(self, x_cls, x_reg): - -``` - -Second, implement a new RoI Head if it is necessary. We plan to inherit the new `DoubleHeadRoIHead` from `StandardRoIHead`. We can find that a `StandardRoIHead` already implements the following functions. - -```python -import torch - -from mmdet.core import bbox2result, bbox2roi, build_assigner, build_sampler -from ..builder import HEADS, build_head, build_roi_extractor -from .base_roi_head import BaseRoIHead -from .test_mixins import BBoxTestMixin, MaskTestMixin - - -@HEADS.register_module() -class StandardRoIHead(BaseRoIHead, BBoxTestMixin, MaskTestMixin): - """Simplest base roi head including one bbox head and one mask head. - """ - - def init_assigner_sampler(self): - - def init_bbox_head(self, bbox_roi_extractor, bbox_head): - - def init_mask_head(self, mask_roi_extractor, mask_head): - - def init_weights(self, pretrained): - - def forward_dummy(self, x, proposals): - - - def forward_train(self, - x, - img_metas, - proposal_list, - gt_bboxes, - gt_labels, - gt_bboxes_ignore=None, - gt_masks=None): - - def _bbox_forward(self, x, rois): - - def _bbox_forward_train(self, x, sampling_results, gt_bboxes, gt_labels, - img_metas): - - def _mask_forward_train(self, x, sampling_results, bbox_feats, gt_masks, - img_metas): - - def _mask_forward(self, x, rois=None, pos_inds=None, bbox_feats=None): - - - def simple_test(self, - x, - proposal_list, - img_metas, - proposals=None, - rescale=False): - """Test without augmentation.""" - -``` - -Double Head's modification is mainly in the bbox_forward logic, and it inherits other logics from the `StandardRoIHead`. -In the `mmdet/models/roi_heads/double_roi_head.py`, we implement the new RoI Head as the following: - -```python -from ..builder import HEADS -from .standard_roi_head import StandardRoIHead - - -@HEADS.register_module() -class DoubleHeadRoIHead(StandardRoIHead): - """RoI head for Double Head RCNN - - https://arxiv.org/abs/1904.06493 - """ - - def __init__(self, reg_roi_scale_factor, **kwargs): - super(DoubleHeadRoIHead, self).__init__(**kwargs) - self.reg_roi_scale_factor = reg_roi_scale_factor - - def _bbox_forward(self, x, rois): - bbox_cls_feats = self.bbox_roi_extractor( - x[:self.bbox_roi_extractor.num_inputs], rois) - bbox_reg_feats = self.bbox_roi_extractor( - x[:self.bbox_roi_extractor.num_inputs], - rois, - roi_scale_factor=self.reg_roi_scale_factor) - if self.with_shared_head: - bbox_cls_feats = self.shared_head(bbox_cls_feats) - bbox_reg_feats = self.shared_head(bbox_reg_feats) - cls_score, bbox_pred = self.bbox_head(bbox_cls_feats, bbox_reg_feats) - - bbox_results = dict( - cls_score=cls_score, - bbox_pred=bbox_pred, - bbox_feats=bbox_cls_feats) - return bbox_results -``` - -Last, the users need to add the module in the `mmdet/models/bbox_heads/__init__.py` and `mmdet/models/roi_heads/__init__.py` thus the corresponding registry could find and load them. - -To config file of Double Head R-CNN is as the following - -```python -_base_ = '../faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py' -model = dict( - roi_head=dict( - type='DoubleHeadRoIHead', - reg_roi_scale_factor=1.3, - bbox_head=dict( - _delete_=True, - type='DoubleConvFCBBoxHead', - num_convs=4, - num_fcs=2, - in_channels=256, - conv_out_channels=1024, - fc_out_channels=1024, - roi_feat_size=7, - num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), - reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=2.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=2.0)))) - -``` - -Since MMDetection 2.0, the config system support to inherit configs such that the users can focus on the modification. -The Double Head R-CNN mainly uses a new DoubleHeadRoIHead and a new -`DoubleConvFCBBoxHead`, the arguments are set according to the `__init__` function of each module. - - -### Add new loss - -Assume you want to add a new loss as `MyLoss`, for bounding box regression. -To add a new loss function, the users need implement it in `mmdet/models/losses/my_loss.py`. -The decorator `weighted_loss` enable the loss to be weighted for each element. - -```python -import torch -import torch.nn as nn - -from ..builder import LOSSES -from .utils import weighted_loss - -@weighted_loss -def my_loss(pred, target): - assert pred.size() == target.size() and target.numel() > 0 - loss = torch.abs(pred - target) - return loss - -@LOSSES.register_module() -class MyLoss(nn.Module): - - def __init__(self, reduction='mean', loss_weight=1.0): - super(MyLoss, self).__init__() - self.reduction = reduction - self.loss_weight = loss_weight - - def forward(self, - pred, - target, - weight=None, - avg_factor=None, - reduction_override=None): - assert reduction_override in (None, 'none', 'mean', 'sum') - reduction = ( - reduction_override if reduction_override else self.reduction) - loss_bbox = self.loss_weight * my_loss( - pred, target, weight, reduction=reduction, avg_factor=avg_factor) - return loss_bbox -``` - -Then the users need to add it in the `mmdet/models/losses/__init__.py`. -```python -from .my_loss import MyLoss, my_loss - -``` - -To use it, modify the `loss_xxx` field. -Since MyLoss is for regrression, you need to modify the `loss_bbox` field in the head. -```python -loss_bbox=dict(type='MyLoss', loss_weight=1.0)) -``` From 81a2b5499c6c20fcf04b95ff5f965047a43a0d8a Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Mon, 21 Dec 2020 21:28:09 +0800 Subject: [PATCH 28/43] Preliminary update --- docs/tutorials/customize_models.md | 66 ++++++++++++++++++++++++++++-- 1 file changed, 62 insertions(+), 4 deletions(-) diff --git a/docs/tutorials/customize_models.md b/docs/tutorials/customize_models.md index 8f76980bd8..0d6e1cebdf 100644 --- a/docs/tutorials/customize_models.md +++ b/docs/tutorials/customize_models.md @@ -1,15 +1,73 @@ # Tutorial 4: Customize Models -We basically categorize model components into 5 types. +We basically categorize model components into 6 types. -- backbone: usually an FCN network to extract feature maps, e.g., ResNet, MobileNet. -- neck: the component between backbones and heads, e.g., FPN, PAFPN. +- encoder: including voxel layer, voxel encoder and middle encoder used in voxel-based methods before backbone, e.g., HardVFE and PointPillarsScatter. +- backbone: usually an FCN network to extract feature maps, e.g., ResNet, SECOND. +- neck: the component between backbones and heads, e.g., FPN, SECONDFPN. - head: the component for specific tasks, e.g., bbox prediction and mask prediction. -- roi extractor: the part for extracting RoI features from feature maps, e.g., RoI Align. +- roi extractor: the part for extracting RoI features from feature maps, e.g., H3DRoIHead and PartAggregationROIHead. - loss: the component in head for calculating losses, e.g., FocalLoss, L1Loss, and GHMLoss. ## Develop new components +### Add a new encoder + +Here we show how to develop new components with an example of HardVFE. + +#### 1. Define a new voxel encoder (e.g. HardVFE) + +Create a new file `mmdet3d/models/voxel_encoders/voxel_encoder.py`. + +```python +import torch.nn as nn + +from ..builder import VOXEL_ENCODERS + + +@VOXEL_ENCODERS.register_module() +class HardVFE(nn.Module): + + def __init__(self, arg1, arg2): + pass + + def forward(self, x): # should return a tuple + pass + + def init_weights(self, pretrained=None): + pass +``` + +#### 2. Import the module + +You can either add the following line to `mmdet3d/models/voxel_encoders/__init__.py` + +```python +from .voxel_encoder import HardVFE +``` + +or alternatively add + +```python +custom_imports = dict( + imports=['mmdet3d.models.voxel_encoders.HardVFE'], + allow_failed_imports=False) +``` + +to the config file to avoid modifying the original code. + +#### 3. Use the backbone in your config file + +```python +model = dict( + ... + voxel_encoder=dict( + type='HardVFE', + arg1=xxx, + arg2=xxx), + ... +``` + ### Add a new backbone Here we show how to develop new components with an example of MobileNet. From d2b2e4b9344dad248137efd9d9ff90b31e23e956 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Tue, 22 Dec 2020 10:48:32 +0800 Subject: [PATCH 29/43] Update customize_models.md --- docs/tutorials/customize_models.md | 355 ++++++++++++++++++----------- 1 file changed, 226 insertions(+), 129 deletions(-) diff --git a/docs/tutorials/customize_models.md b/docs/tutorials/customize_models.md index 0d6e1cebdf..c05eb26825 100644 --- a/docs/tutorials/customize_models.md +++ b/docs/tutorials/customize_models.md @@ -70,11 +70,11 @@ model = dict( ### Add a new backbone -Here we show how to develop new components with an example of MobileNet. +Here we show how to develop new components with an example of SECOND (Sparsely Embedded Convolutional Detection). -#### 1. Define a new backbone (e.g. MobileNet) +#### 1. Define a new backbone (e.g. SECOND) -Create a new file `mmdet/models/backbones/mobilenet.py`. +Create a new file `mmdet3d/models/backbones/second.py`. ```python import torch.nn as nn @@ -83,7 +83,7 @@ from ..builder import BACKBONES @BACKBONES.register_module() -class MobileNet(nn.Module): +class SECOND(nn.Module): def __init__(self, arg1, arg2): pass @@ -97,17 +97,17 @@ class MobileNet(nn.Module): #### 2. Import the module -You can either add the following line to `mmdet/models/backbones/__init__.py` +You can either add the following line to `mmdet3d/models/backbones/__init__.py` ```python -from .mobilenet import MobileNet +from .second import SECOND ``` or alternatively add ```python custom_imports = dict( - imports=['mmdet.models.backbones.mobilenet'], + imports=['mmdet3d.models.backbones.second'], allow_failed_imports=False) ``` @@ -119,7 +119,7 @@ to the config file to avoid modifying the original code. model = dict( ... backbone=dict( - type='MobileNet', + type='SECOND', arg1=xxx, arg2=xxx), ... @@ -127,43 +127,44 @@ model = dict( ### Add new necks -#### 1. Define a neck (e.g. PAFPN) +#### 1. Define a neck (e.g. SECONDFPN) -Create a new file `mmdet/models/necks/pafpn.py`. +Create a new file `mmdet3d/models/necks/second_fpn.py`. ```python from ..builder import NECKS @NECKS.register -class PAFPN(nn.Module): +class SECONDFPN(nn.Module): def __init__(self, - in_channels, - out_channels, - num_outs, - start_level=0, - end_level=-1, - add_extra_convs=False): + in_channels=[128, 128, 256], + out_channels=[256, 256, 256], + upsample_strides=[1, 2, 4], + norm_cfg=dict(type='BN', eps=1e-3, momentum=0.01), + upsample_cfg=dict(type='deconv', bias=False), + conv_cfg=dict(type='Conv2d', bias=False), + use_conv_for_no_stride=False): pass - def forward(self, inputs): + def forward(self, X): # implementation is ignored pass ``` #### 2. Import the module -You can either add the following line to `mmdet/models/necks/__init__.py`, +You can either add the following line to `mmdet3D/models/necks/__init__.py`, ```python -from .pafpn import PAFPN +from .second_fpn import SECONDFPN ``` or alternatively add ```python custom_imports = dict( - imports=['mmdet.models.necks.mobilenet'], + imports=['mmdet3d.models.necks.second_fpn'], allow_failed_imports=False) ``` @@ -173,82 +174,100 @@ to the config file and avoid modifying the original code. ```python neck=dict( - type='PAFPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5) + type='SECONDFPN', + in_channels=[64, 128, 256], + upsample_strides=[1, 2, 4], + out_channels=[128, 128, 128]) ``` ### Add new heads -Here we show how to develop a new head with the example of [Double Head R-CNN](https://arxiv.org/abs/1904.06493) as the following. +Here we show how to develop a new head with the example of [PartA2 Head](https://arxiv.org/abs/1907.03670) as the following. -First, add a new bbox head in `mmdet/models/roi_heads/bbox_heads/double_bbox_head.py`. -Double Head R-CNN implements a new bbox head for object detection. -To implement a bbox head, basically we need to implement three functions of the new module as the following. +**Note**: Here the example of PartA2 RoI Head is used in the second stage. For one-stage heads, please refer to examples in `mmdet3d/models/dense_heads/`. They are more commonly used in 3D detection for autonomous driving due to its simplicity and high efficiency. + +First, add a new bbox head in `mmdet3d/models/roi_heads/bbox_heads/parta2_bbox_head.py`. +PartA2 RoI Head implements a new bbox head for object detection. +To implement a bbox head, basically we need to implement three functions of the new module as the following. Sometimes other related functions like `loss` and `get_targets` are also required. ```python from mmdet.models.builder import HEADS from .bbox_head import BBoxHead @HEADS.register_module() -class DoubleConvFCBBoxHead(BBoxHead): - r"""Bbox head used in Double-Head R-CNN - - /-> cls - /-> shared convs -> - \-> reg - roi features - /-> cls - \-> shared fc -> - \-> reg - """ # noqa: W605 +class PartA2BboxHead(nn.Module): + """PartA2 RoI head.""" def __init__(self, - num_convs=0, - num_fcs=0, - conv_out_channels=1024, - fc_out_channels=1024, - conv_cfg=None, - norm_cfg=dict(type='BN'), - **kwargs): - kwargs.setdefault('with_avg_pool', True) - super(DoubleConvFCBBoxHead, self).__init__(**kwargs) + num_classes, + seg_in_channels, + part_in_channels, + seg_conv_channels=None, + part_conv_channels=None, + merge_conv_channels=None, + down_conv_channels=None, + shared_fc_channels=None, + cls_channels=None, + reg_channels=None, + dropout_ratio=0.1, + roi_feat_size=14, + with_corner_loss=True, + bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'), + conv_cfg=dict(type='Conv1d'), + norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.01), + loss_bbox=dict( + type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0), + loss_cls=dict( + type='CrossEntropyLoss', + use_sigmoid=True, + reduction='none', + loss_weight=1.0)): + super(PartA2BboxHead, self).__init__() def init_weights(self): # conv layers are already initialized by ConvModule - def forward(self, x_cls, x_reg): + def forward(self, seg_feats, part_feats): ``` -Second, implement a new RoI Head if it is necessary. We plan to inherit the new `DoubleHeadRoIHead` from `StandardRoIHead`. We can find that a `StandardRoIHead` already implements the following functions. +Second, implement a new RoI Head if it is necessary. We plan to inherit the new `PartAggregationROIHead` from `Base3DRoIHead`. We can find that a `Base3DRoIHead` already implements the following functions. ```python -import torch - -from mmdet.core import bbox2result, bbox2roi, build_assigner, build_sampler -from ..builder import HEADS, build_head, build_roi_extractor -from .base_roi_head import BaseRoIHead -from .test_mixins import BBoxTestMixin, MaskTestMixin +from abc import ABCMeta, abstractmethod +from torch import nn as nn @HEADS.register_module() -class StandardRoIHead(BaseRoIHead, BBoxTestMixin, MaskTestMixin): - """Simplest base roi head including one bbox head and one mask head. - """ +class Base3DRoIHead(nn.Module, metaclass=ABCMeta): + """Base class for 3d RoIHeads.""" - def init_assigner_sampler(self): + def __init__(self, + bbox_head=None, + mask_roi_extractor=None, + mask_head=None, + train_cfg=None, + test_cfg=None): - def init_bbox_head(self, bbox_roi_extractor, bbox_head): + @property + def with_bbox(self): - def init_mask_head(self, mask_roi_extractor, mask_head): + @property + def with_mask(self): + @abstractmethod def init_weights(self, pretrained): - def forward_dummy(self, x, proposals): + @abstractmethod + def init_bbox_head(self): + @abstractmethod + def init_mask_head(self): + @abstractmethod + def init_assigner_sampler(self): + + @abstractmethod def forward_train(self, x, img_metas, @@ -256,116 +275,194 @@ class StandardRoIHead(BaseRoIHead, BBoxTestMixin, MaskTestMixin): gt_bboxes, gt_labels, gt_bboxes_ignore=None, - gt_masks=None): - - def _bbox_forward(self, x, rois): - - def _bbox_forward_train(self, x, sampling_results, gt_bboxes, gt_labels, - img_metas): - - def _mask_forward_train(self, x, sampling_results, bbox_feats, gt_masks, - img_metas): - - def _mask_forward(self, x, rois=None, pos_inds=None, bbox_feats=None): - + **kwargs): def simple_test(self, x, proposal_list, img_metas, proposals=None, - rescale=False): + rescale=False, + **kwargs): """Test without augmentation.""" + pass + + def aug_test(self, x, proposal_list, img_metas, rescale=False, **kwargs): + """Test with augmentations. + If rescale is False, then returned bboxes and masks will fit the scale + of imgs[0]. + """ + pass ``` -Double Head's modification is mainly in the bbox_forward logic, and it inherits other logics from the `StandardRoIHead`. -In the `mmdet/models/roi_heads/double_roi_head.py`, we implement the new RoI Head as the following: +Double Head's modification is mainly in the bbox_forward logic, and it inherits other logics from the `Base3DRoIHead`. +In the `mmdet3d/models/roi_heads/part_aggregation_roi_head.py`, we implement the new RoI Head as the following: ```python -from ..builder import HEADS -from .standard_roi_head import StandardRoIHead +from torch.nn import functional as F +from mmdet3d.core import AssignResult +from mmdet3d.core.bbox import bbox3d2result, bbox3d2roi +from mmdet.core import build_assigner, build_sampler +from mmdet.models import HEADS +from ..builder import build_head, build_roi_extractor +from .base_3droi_head import Base3DRoIHead -@HEADS.register_module() -class DoubleHeadRoIHead(StandardRoIHead): - """RoI head for Double Head RCNN - https://arxiv.org/abs/1904.06493 +@HEADS.register_module() +class PartAggregationROIHead(Base3DRoIHead): + """Part aggregation roi head for PartA2. + Args: + semantic_head (ConfigDict): Config of semantic head. + num_classes (int): The number of classes. + seg_roi_extractor (ConfigDict): Config of seg_roi_extractor. + part_roi_extractor (ConfigDict): Config of part_roi_extractor. + bbox_head (ConfigDict): Config of bbox_head. + train_cfg (ConfigDict): Training config. + test_cfg (ConfigDict): Testing config. """ - def __init__(self, reg_roi_scale_factor, **kwargs): - super(DoubleHeadRoIHead, self).__init__(**kwargs) - self.reg_roi_scale_factor = reg_roi_scale_factor - - def _bbox_forward(self, x, rois): - bbox_cls_feats = self.bbox_roi_extractor( - x[:self.bbox_roi_extractor.num_inputs], rois) - bbox_reg_feats = self.bbox_roi_extractor( - x[:self.bbox_roi_extractor.num_inputs], - rois, - roi_scale_factor=self.reg_roi_scale_factor) - if self.with_shared_head: - bbox_cls_feats = self.shared_head(bbox_cls_feats) - bbox_reg_feats = self.shared_head(bbox_reg_feats) - cls_score, bbox_pred = self.bbox_head(bbox_cls_feats, bbox_reg_feats) + def __init__(self, + semantic_head, + num_classes=3, + seg_roi_extractor=None, + part_roi_extractor=None, + bbox_head=None, + train_cfg=None, + test_cfg=None): + super(PartAggregationROIHead, self).__init__( + bbox_head=bbox_head, train_cfg=train_cfg, test_cfg=test_cfg) + self.num_classes = num_classes + assert semantic_head is not None + self.semantic_head = build_head(semantic_head) + + if seg_roi_extractor is not None: + self.seg_roi_extractor = build_roi_extractor(seg_roi_extractor) + if part_roi_extractor is not None: + self.part_roi_extractor = build_roi_extractor(part_roi_extractor) + + self.init_assigner_sampler() + + def _bbox_forward(self, seg_feats, part_feats, voxels_dict, rois): + """Forward function of roi_extractor and bbox_head used in both + training and testing. + Args: + seg_feats (torch.Tensor): Point-wise semantic features. + part_feats (torch.Tensor): Point-wise part prediction features. + voxels_dict (dict): Contains information of voxels. + rois (Tensor): Roi boxes. + Returns: + dict: Contains predictions of bbox_head and + features of roi_extractor. + """ + pooled_seg_feats = self.seg_roi_extractor(seg_feats, + voxels_dict['voxel_centers'], + voxels_dict['coors'][..., 0], + rois) + pooled_part_feats = self.part_roi_extractor( + part_feats, voxels_dict['voxel_centers'], + voxels_dict['coors'][..., 0], rois) + cls_score, bbox_pred = self.bbox_head(pooled_seg_feats, + pooled_part_feats) bbox_results = dict( cls_score=cls_score, bbox_pred=bbox_pred, - bbox_feats=bbox_cls_feats) + pooled_seg_feats=pooled_seg_feats, + pooled_part_feats=pooled_part_feats) return bbox_results ``` +Here we omit more details related to other functions. Please see the [code](mmdet3d/models/roi_heads/part_aggregation_roi_head.py) for more details. + Last, the users need to add the module in -`mmdet/models/bbox_heads/__init__.py` and `mmdet/models/roi_heads/__init__.py` thus the corresponding registry could find and load them. +`mmdet3d/models/bbox_heads/__init__.py` and `mmdet3d/models/roi_heads/__init__.py` thus the corresponding registry could find and load them. Alternatively, the users can add ```python custom_imports=dict( - imports=['mmdet.models.roi_heads.double_roi_head', 'mmdet.models.bbox_heads.double_bbox_head']) + imports=['mmdet3d.models.roi_heads.part_aggregation_roi_head', 'mmdet3d.models.bbox_heads.parta2_bbox_head']) ``` to the config file and achieve the same goal. -The config file of Double Head R-CNN is as the following +The config file of PartAggregationROIHead is as the following ```python -_base_ = '../faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py' model = dict( + ... roi_head=dict( - type='DoubleHeadRoIHead', - reg_roi_scale_factor=1.3, + type='PartAggregationROIHead', + num_classes=3, + semantic_head=dict( + type='PointwiseSemanticHead', + in_channels=16, + extra_width=0.2, + seg_score_thr=0.3, + num_classes=3, + loss_seg=dict( + type='FocalLoss', + use_sigmoid=True, + reduction='sum', + gamma=2.0, + alpha=0.25, + loss_weight=1.0), + loss_part=dict( + type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + seg_roi_extractor=dict( + type='Single3DRoIAwareExtractor', + roi_layer=dict( + type='RoIAwarePool3d', + out_size=14, + max_pts_per_voxel=128, + mode='max')), + part_roi_extractor=dict( + type='Single3DRoIAwareExtractor', + roi_layer=dict( + type='RoIAwarePool3d', + out_size=14, + max_pts_per_voxel=128, + mode='avg')), bbox_head=dict( - _delete_=True, - type='DoubleConvFCBBoxHead', - num_convs=4, - num_fcs=2, - in_channels=256, - conv_out_channels=1024, - fc_out_channels=1024, - roi_feat_size=7, - num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), - reg_class_agnostic=False, + type='PartA2BboxHead', + num_classes=3, + seg_in_channels=16, + part_in_channels=4, + seg_conv_channels=[64, 64], + part_conv_channels=[64, 64], + merge_conv_channels=[128, 128], + down_conv_channels=[128, 256], + bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'), + shared_fc_channels=[256, 512, 512, 512], + cls_channels=[256, 256], + reg_channels=[256, 256], + dropout_ratio=0.1, + roi_feat_size=14, + with_corner_loss=True, + loss_bbox=dict( + type='SmoothL1Loss', + beta=1.0 / 9.0, + reduction='sum', + loss_weight=1.0), loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=2.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=2.0)))) - + type='CrossEntropyLoss', + use_sigmoid=True, + reduction='sum', + loss_weight=1.0))) + ... + ) ``` Since MMDetection 2.0, the config system supports to inherit configs such that the users can focus on the modification. -The Double Head R-CNN mainly uses a new DoubleHeadRoIHead and a new -`DoubleConvFCBBoxHead`, the arguments are set according to the `__init__` function of each module. +The second stage of PartA2 Head mainly uses a new PartAggregationROIHead and a new +`PartA2BboxHead`, the arguments are set according to the `__init__` function of each module. ### Add new loss Assume you want to add a new loss as `MyLoss`, for bounding box regression. -To add a new loss function, the users need implement it in `mmdet/models/losses/my_loss.py`. +To add a new loss function, the users need implement it in `mmdet3d/models/losses/my_loss.py`. The decorator `weighted_loss` enable the loss to be weighted for each element. ```python @@ -403,7 +500,7 @@ class MyLoss(nn.Module): return loss_bbox ``` -Then the users need to add it in the `mmdet/models/losses/__init__.py`. +Then the users need to add it in the `mmdet3d/models/losses/__init__.py`. ```python from .my_loss import MyLoss, my_loss @@ -414,7 +511,7 @@ Alternatively, you can add ```python custom_imports=dict( - imports=['mmdet.models.losses.my_loss']) + imports=['mmdet3d.models.losses.my_loss']) ``` to the config file and achieve the same goal. From 6b9f96c8cad54cbe9cd7035867dfabf76ac803df Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Tue, 22 Dec 2020 20:56:32 +0800 Subject: [PATCH 30/43] Update useful_tools.md --- docs/useful_tools.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/useful_tools.md b/docs/useful_tools.md index 663f305fa9..57069816f8 100644 --- a/docs/useful_tools.md +++ b/docs/useful_tools.md @@ -70,7 +70,7 @@ You can use 3D visualization software such as the [MeshLab](http://www.meshlab.n ## Model Complexity -`tools/get_flops.py` is a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) to compute the FLOPs and params of a given model. +You can use `tools/get_flops.py` in MMDetection, a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch), to compute the FLOPs and params of a given model. ```shell python tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}] @@ -108,7 +108,7 @@ python tools/regnet2mmdet.py ${SRC} ${DST} [-h] ### Detectron ResNet to Pytorch -`tools/detectron2pytorch.py` converts keys in the original detectron pretrained +`tools/detectron2pytorch.py` in MMDetection could convert keys in the original detectron pretrained ResNet models to PyTorch style. ```shell From 8cdda48ed039de5b27a79a9e2d2a3e419065bd8f Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Tue, 22 Dec 2020 21:27:55 +0800 Subject: [PATCH 31/43] Update title --- docs/data_preparation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data_preparation.md b/docs/data_preparation.md index 1e1aa303b3..52e68b103f 100644 --- a/docs/data_preparation.md +++ b/docs/data_preparation.md @@ -1,4 +1,4 @@ -# Data Preparation +# Dataset Preparation It is recommended to symlink the dataset root to `$MMDETECTION3D/data`. If your folder structure is different from the following, you may need to change the corresponding paths in config files. From 8098a1016080ed127727fb215e486fdf72134d71 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Tue, 22 Dec 2020 21:29:20 +0800 Subject: [PATCH 32/43] Fix a typo of tutorial 4 --- docs/tutorials/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/index.rst b/docs/tutorials/index.rst index 5aa46b3a00..b5e1137a67 100644 --- a/docs/tutorials/index.rst +++ b/docs/tutorials/index.rst @@ -4,6 +4,6 @@ config.md customize_dataset.md data_pipeline.md - customize_model.md + customize_models.md customize_runtime.md waymo.md From f3e802c860b7211c52aae3b410351636b18389df Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Tue, 22 Dec 2020 21:30:49 +0800 Subject: [PATCH 33/43] Update level of titles --- docs/useful_tools.md | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/docs/useful_tools.md b/docs/useful_tools.md index 57069816f8..0b43d7ee3c 100644 --- a/docs/useful_tools.md +++ b/docs/useful_tools.md @@ -1,8 +1,6 @@ -# Useful Tools and Scripts - We provide lots of useful tools under `tools/` directory. -## Log Analysis +# Log Analysis You can plot loss/mAP curves given a training log file. Run `pip install seaborn` first to install the dependency. @@ -48,7 +46,7 @@ time std over epochs is 0.0028 average iter time: 1.1959 s/iter ``` -## Visualization +# Visualization To see the SUNRGBD, ScanNet or KITTI points and detection results, you can run the following command @@ -68,7 +66,7 @@ You can use 3D visualization software such as the [MeshLab](http://www.meshlab.n **Notice**: The visualization API is a little unstable since we plan to refactor these parts together with MMDetection in the future. -## Model Complexity +# Model Complexity You can use `tools/get_flops.py` in MMDetection, a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch), to compute the FLOPs and params of a given model. @@ -95,9 +93,9 @@ Params: 37.74 M 2. Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details. 3. The FLOPs of two-stage detectors is dependent on the number of proposals. -## Model Conversion +# Model Conversion -### RegNet model to MMDetection +## RegNet model to MMDetection `tools/regnet2mmdet.py` convert keys in pycls pretrained RegNet models to MMDetection style. @@ -106,7 +104,7 @@ Params: 37.74 M python tools/regnet2mmdet.py ${SRC} ${DST} [-h] ``` -### Detectron ResNet to Pytorch +## Detectron ResNet to Pytorch `tools/detectron2pytorch.py` in MMDetection could convert keys in the original detectron pretrained ResNet models to PyTorch style. @@ -115,7 +113,7 @@ python tools/regnet2mmdet.py ${SRC} ${DST} [-h] python tools/detectron2pytorch.py ${SRC} ${DST} ${DEPTH} [-h] ``` -### Prepare a model for publishing +## Prepare a model for publishing `tools/publish_model.py` helps users to prepare their model for publishing. @@ -138,13 +136,13 @@ python tools/publish_model.py work_dirs/faster_rcnn/latest.pth faster_rcnn_r50_f The final output filename will be `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth`. -## Dataset Conversion +# Dataset Conversion TBD -## Miscellaneous +# Miscellaneous -### Print the entire config +## Print the entire config `tools/print_config.py` prints the whole config verbatim, expanding all its imports. From 70f9b4c31535f148e98602f7fb36c5ec7022a1de Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Sat, 26 Dec 2020 11:33:06 +0800 Subject: [PATCH 34/43] Merge verification and demo --- docs/getting_started.md | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/docs/getting_started.md b/docs/getting_started.md index 03b48a3fb2..7269e76046 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -172,11 +172,9 @@ PYTHONPATH="$(dirname $0)/..":$PYTHONPATH # Verification -TBD +## Demo -# Demo - -## Point cloud demo +### Point cloud demo We provide a demo script to test a single sample. @@ -233,5 +231,3 @@ result, data = inference_detector(model, point_cloud) # visualize the results and save the results in 'results' folder model.show_results(data, result, out_dir='results') ``` - -A notebook demo can be found in [demo/inference_demo.ipynb](https://github.com/open-mmlab/mmdetection/blob/master/demo/inference_demo.ipynb). From 73793f6deae685c508e25eec28eaf7b07eee27c8 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Sat, 26 Dec 2020 11:34:27 +0800 Subject: [PATCH 35/43] Update 1_exist_data_model.md --- docs/1_exist_data_model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/1_exist_data_model.md b/docs/1_exist_data_model.md index dab1f54041..4dc95be394 100644 --- a/docs/1_exist_data_model.md +++ b/docs/1_exist_data_model.md @@ -4,7 +4,7 @@ Here we provide testing scripts to evaluate a whole dataset (SUNRGBD, ScanNet, KITTI, etc.). -For high-level apis easier to integrated into other projects and basic demos, please refer to Demo under [Get Started](./getting_started.md). +For high-level apis easier to integrated into other projects and basic demos, please refer to Verification/Demo under [Get Started](./getting_started.md). ### Test existing models on standard datasets From af6979469a1381fee7911c040aec3d35c59e85b3 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Sat, 26 Dec 2020 15:31:03 +0800 Subject: [PATCH 36/43] Update [Dataset Conversion] --- docs/useful_tools.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/docs/useful_tools.md b/docs/useful_tools.md index 0b43d7ee3c..33d42c71d6 100644 --- a/docs/useful_tools.md +++ b/docs/useful_tools.md @@ -138,7 +138,22 @@ The final output filename will be `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth # Dataset Conversion -TBD +`tools/data_converter/` contains tools to convert datasets to other formats. Most of them convert datasets to pickle based info files, like kitti, nuscenes and lyft. Waymo converter is used to reorganize waymo raw data like KITTI style. Users could refer to them for our approach to converting data format. It is also convenient to modify them to use as scripts like nuImages converter. + +To convert the nuImages dataset into COCO format, please use the command below: + +```shell +python -u tools/data_converter/nuimage_converter.py --data-root ${DATA_ROOT} --version ${VERIONS} \ + --out-dir ${OUT_DIR} --nproc ${NUM_WORKERS} --extra-tag ${TAG} +``` + +- `--data-root`: the root of the dataset, defaults to `./data/nuimages`. +- `--version`: the version of the dataset, defaults to `v1.0-mini`. To get the full dataset, please use `--version v1.0-train v1.0-val v1.0-mini` +- `--out-dir`: the output directory of annotations and semantic masks, defaults to `./data/nuimages/annotations/`. +- `--nproc`: number of workers for data preparation, defaults to `4`. Larger number could reduce the preparation time as images are processed in parallel. +- `--extra-tag`: extra tag of the annotations, defaults to `nuimages`. This can be used to separate different annotations processed in different time for study. + +More details could be referred to the [doc](data_preparation.md) for dataset preparation and [README](../configs/nuimages/README.md) for nuImages dataset. # Miscellaneous From 60e8e82a687c64e96aa2d29433b7fb156fbb3d49 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Tue, 29 Dec 2020 15:39:25 +0800 Subject: [PATCH 37/43] Update 2_new_data_model.md --- docs/2_new_data_model.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/2_new_data_model.md b/docs/2_new_data_model.md index af837b12e1..65b41fa3d0 100644 --- a/docs/2_new_data_model.md +++ b/docs/2_new_data_model.md @@ -12,7 +12,7 @@ The basic steps are as below: There are three ways to support a new dataset in MMDetection3D: -1. reorganize the dataset into existed format. +1. reorganize the dataset into existing format. 2. reorganize the dataset into a middle format. 3. implement a new dataset. @@ -20,7 +20,7 @@ Usually we recommend to use the first two methods which are usually easier than In this note, we give an example for converting the data into KITTI format. -**Note**: We take Waymo as the example here considering its format is totally different from other existed formats. For other datasets using similar methods to organize data, like Lyft compared to nuScenes, it would be easier to directly implement the new dataset inherited from an existed one. +**Note**: We take Waymo as the example here considering its format is totally different from other existing formats. For other datasets using similar methods to organize data, like Lyft compared to nuScenes, it would be easier to directly implement the new data converter (for the second approach above) instead of converting it to another format (for the first approach above). ### KITTI dataset format From 56ddf93f714575124c9c76a5ffec2f4ff299ac8f Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Tue, 29 Dec 2020 15:40:21 +0800 Subject: [PATCH 38/43] Update customize_dataset.md --- docs/tutorials/customize_dataset.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/tutorials/customize_dataset.md b/docs/tutorials/customize_dataset.md index a8074b9da5..53c15c61cf 100644 --- a/docs/tutorials/customize_dataset.md +++ b/docs/tutorials/customize_dataset.md @@ -3,22 +3,22 @@ ## Support new data format To support a new data format, you can either convert them to existing formats or directly convert them to the middle format. You could also choose to convert them offline (before training by a script) or online (implement a new dataset and do the conversion at training). In MMDetection3D, for the data that is inconvenient to read directly online, we recommend to convert it into KITTI format and do the conversion offline, thus you only need to modify the config's data annotation paths and classes after the conversion. -For data sharing similar format with existed datasets, like Lyft compared to nuScenes, we recommend to directly implement data converter and dataset class. During the procedure, inheritation could be taken into consideration to reduce the implementation workload. +For data sharing similar format with existing datasets, like Lyft compared to nuScenes, we recommend to directly implement data converter and dataset class. During the procedure, inheritation could be taken into consideration to reduce the implementation workload. ### Reorganize new data formats to existing format For data that is inconvenient to read directly online, the simplest way is to convert your dataset to existing dataset formats. -Typically we need a data converter to reorganize the raw data and convert the annotation format into KITTI style. Then a new dataset class inherited from existed ones is sometimes necessary for dealing with some specific differences between datasets. Finally, the users need to further modify the config files to use the dataset. An [example](../2_new_data_model.md) training predefined models on Waymo dataset by converting it into KITTI style can be taken for reference. +Typically we need a data converter to reorganize the raw data and convert the annotation format into KITTI style. Then a new dataset class inherited from existing ones is sometimes necessary for dealing with some specific differences between datasets. Finally, the users need to further modify the config files to use the dataset. An [example](../2_new_data_model.md) training predefined models on Waymo dataset by converting it into KITTI style can be taken for reference. ### Reorganize new data format to middle format -It is also fine if you do not want to convert the annotation format to existed formats. +It is also fine if you do not want to convert the annotation format to existing formats. Actually, we convert all the supported datasets into pickle files, which summarize useful information for model training and inference. The annotation of a dataset is a list of dict, each dict corresponds to a frame. A basic example (used in KITTI) is as follows. A frame consists of several keys, like `image`, `point_cloud`, `calib` and `annos`. -As long as we could directly read data according to these information, the organization of raw data could also be different from existed ones. +As long as we could directly read data according to these information, the organization of raw data could also be different from existing ones. With this design, we provide an alternative choice for customizing datasets. ```python From 826aa808b616c14cbb20c9a9e2ba4cae03f29b18 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Tue, 29 Dec 2020 15:54:18 +0800 Subject: [PATCH 39/43] Update the example of customized dataset --- docs/tutorials/customize_dataset.md | 142 ++++++++++++++++------------ 1 file changed, 83 insertions(+), 59 deletions(-) diff --git a/docs/tutorials/customize_dataset.md b/docs/tutorials/customize_dataset.md index 53c15c61cf..7759aa4c37 100644 --- a/docs/tutorials/customize_dataset.md +++ b/docs/tutorials/customize_dataset.md @@ -64,76 +64,100 @@ like [KittiDataset](../../mmdet3d/datasets/kitti_dataset.py) and [ScanNetDataset ### An example of customized dataset -Here we provide an example of customized dataset for image input like MMDetection. The same principle applies to the case in MMDetection3D. +Here we provide an example of customized dataset. -Assume the annotation is in a new format in text files. -The bounding boxes annotations are stored in text file `annotation.txt` as the following +Assume the annotation has been reorganized into a list of dict in pickle files like ScanNet. +The bounding boxes annotations are stored in `annotation.pkl` as the following ``` -# -000001.jpg -1280 720 -2 -10 20 40 60 1 -20 40 50 60 2 -# -000002.jpg -1280 720 -3 -50 20 40 60 2 -20 40 30 45 2 -30 40 50 60 3 +{'point_cloud': {'num_features': 6, 'lidar_idx': 'scene0000_00'}, 'pts_path': 'points/scene0000_00.bin', + 'pts_instance_mask_path': 'instance_mask/scene0000_00.bin', 'pts_semantic_mask_path': 'semantic_mask/scene0000_00.bin', + 'annos': {'gt_num': 27, 'name': array(['window', 'window', 'table', 'counter', 'curtain', 'curtain', + 'desk', 'cabinet', 'sink', 'garbagebin', 'garbagebin', + 'garbagebin', 'sofa', 'refrigerator', 'table', 'table', 'toilet', + 'bed', 'cabinet', 'cabinet', 'cabinet', 'cabinet', 'cabinet', + 'cabinet', 'door', 'door', 'door'], dtype=' Date: Tue, 29 Dec 2020 15:55:06 +0800 Subject: [PATCH 40/43] Update customize_dataset.md --- docs/tutorials/customize_dataset.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/tutorials/customize_dataset.md b/docs/tutorials/customize_dataset.md index 7759aa4c37..0b12a380a0 100644 --- a/docs/tutorials/customize_dataset.md +++ b/docs/tutorials/customize_dataset.md @@ -104,7 +104,6 @@ from .custom_3d import Custom3DDataset @DATASETS.register_module() class MyDataset(Custom3DDataset): - """ CLASSES = ('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window', 'bookshelf', 'picture', 'counter', 'desk', 'curtain', 'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub', From 6b116e89fcb1d5d8dc0f444bc1d52809956f04f3 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Wed, 30 Dec 2020 15:59:08 +0800 Subject: [PATCH 41/43] Enhance the doc structure --- docs/data_preparation.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/docs/data_preparation.md b/docs/data_preparation.md index 52e68b103f..6abef5de00 100644 --- a/docs/data_preparation.md +++ b/docs/data_preparation.md @@ -1,5 +1,7 @@ # Dataset Preparation +## Before Preparation + It is recommended to symlink the dataset root to `$MMDETECTION3D/data`. If your folder structure is different from the following, you may need to change the corresponding paths in config files. @@ -65,6 +67,10 @@ mmdetection3d ``` +## Download and Data Preparation + +### KITTI + Download KITTI 3D detection data [HERE](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). Prepare kitti data by running ```bash @@ -79,6 +85,8 @@ wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/sec python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti ``` +### Waymo + Download Waymo open dataset V1.2 [HERE](https://waymo.com/open/download/) and its data split [HERE](https://drive.google.com/drive/folders/18BVuF_RYJF0NjZpt8SnfzANiakoRMf0o?usp=sharing). Then put tfrecord files into corresponding folders in `data/waymo/waymo_format/` and put the data split txt files into `data/waymo/kitti_format/ImageSets`. Download ground truth bin file for validation set [HERE](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) and put it into `data/waymo/waymo_format/`. A tip is that you can use `gsutil` to download the large-scale dataset with commands. You can take this [tool](https://github.com/RalphMao/Waymo-Dataset-Tool) as an example for more details. Subsequently, prepare waymo data by running ```bash @@ -87,12 +95,16 @@ python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/way Note that if your local disk does not have enough space for saving converted data, you can change the `out-dir` to anywhere else. Just remember to create folders and prepare data there in advance and link them back to `data/waymo/kitti_format` after the data conversion. +### NuScenes + Download nuScenes V1.0 full dataset data [HERE]( https://www.nuscenes.org/download). Prepare nuscenes data by running ```bash python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes ``` +### Lyft + Download Lyft 3D detection data [HERE](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data). Prepare Lyft data by running ```bash @@ -101,8 +113,12 @@ python tools/create_data.py lyft --root-path ./data/lyft --out-dir ./data/lyft - Note that we follow the original folder names for clear organization. Please rename the raw folders as shown above. +### ScanNet and SUN RGB-D + To prepare scannet data, please see [scannet](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/README.md). To prepare sunrgbd data, please see [sunrgbd](https://github.com/open-mmlab/mmdetection3d/blob/master/data/sunrgbd/README.md). +### Customized Datasets + For using custom datasets, please refer to [Tutorials 2: Customize Datasets](tutorials/new_dataset.md). From f59ace44e9298b4fe4334a0d6cacac511bb15c72 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Wed, 30 Dec 2020 16:02:41 +0800 Subject: [PATCH 42/43] Update README.md --- data/scannet/README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/data/scannet/README.md b/data/scannet/README.md index ccb6e30654..4669034e77 100644 --- a/data/scannet/README.md +++ b/data/scannet/README.md @@ -28,8 +28,11 @@ scannet ├── scans ├── scannet_train_instance_data ├── points +│ ├── xxxxx.bin ├── instance_mask +│ ├── xxxxx.bin ├── semantic_mask +│ ├── xxxxx.bin ├── scannet_infos_train.pkl ├── scannet_infos_val.pkl From dc7b3456afe2133448cac1f8ec101a5d899ee551 Mon Sep 17 00:00:00 2001 From: twang <30491025+Tai-Wang@users.noreply.github.com> Date: Wed, 30 Dec 2020 16:03:47 +0800 Subject: [PATCH 43/43] Change the abs path to relative path --- docs/data_preparation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/data_preparation.md b/docs/data_preparation.md index 6abef5de00..6384ce8d42 100644 --- a/docs/data_preparation.md +++ b/docs/data_preparation.md @@ -115,9 +115,9 @@ Note that we follow the original folder names for clear organization. Please ren ### ScanNet and SUN RGB-D -To prepare scannet data, please see [scannet](https://github.com/open-mmlab/mmdetection3d/blob/master/data/scannet/README.md). +To prepare scannet data, please see [scannet](../data/scannet/README.md). -To prepare sunrgbd data, please see [sunrgbd](https://github.com/open-mmlab/mmdetection3d/blob/master/data/sunrgbd/README.md). +To prepare sunrgbd data, please see [sunrgbd](../data/sunrgbd/README.md). ### Customized Datasets