Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Add SUN RGB-D doc #770

Merged
merged 20 commits into from
Jul 30, 2021
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions data/sunrgbd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

We follow the procedure in [votenet](https://github.com/facebookresearch/votenet/).

1. Download SUNRGBD v2 data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move SUNRGBD.zip, SUNRGBDMeta2DBB_v2.mat, SUNRGBDMeta3DBB_v2.mat and SUNRGBDtoolbox.zip to the OFFICIAL_SUNRGBD folder, unzip the zip files.
1. Download SUNRGBD data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move SUNRGBD.zip, SUNRGBDMeta2DBB_v2.mat, SUNRGBDMeta3DBB_v2.mat and SUNRGBDtoolbox.zip to the OFFICIAL_SUNRGBD folder, unzip the zip files.

2. Enter the `matlab` folder, Extract point clouds and annotations by running `extract_split.m`, `extract_rgbd_data_v2.m` and `extract_rgbd_data_v1.m`.

Expand Down Expand Up @@ -47,12 +47,12 @@ sunrgbd
│ ├── SUNRGBDtoolbox
├── sunrgbd_trainval
│ ├── calib
│ ├── image
│ ├── label_v1
│ ├── train_data_idx.txt
│ ├── depth
│ ├── image
│ ├── label
│ ├── label_v1
│ ├── seg_label
│ ├── train_data_idx.txt
│ ├── val_data_idx.txt
├── points
├── sunrgbd_infos_train.pkl
Expand Down
1 change: 1 addition & 0 deletions docs/datasets/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
:maxdepth: 2

waymo_det.md
sunrgbd_det.md
scannet_det.md
scannet_sem_seg.md
s3dis_sem_seg.md
32 changes: 16 additions & 16 deletions docs/datasets/scannet_det.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ def export(mesh_file,
# bbox format is [x, y, z, dx, dy, dz, label_id]
# [x, y, z] is gravity center of bbox, [dx, dy, dz] is axis-aligned
# [label_id] is semantic label id in 'nyu40id' standard
# Note: since 3d bbox is axis-aligned, the yaw is 0.
# Note: since 3D bbox is axis-aligned, the yaw is 0.
unaligned_bboxes = extract_bbox(mesh_vertices, object_id_to_segs,
object_id_to_label_id, instance_ids)
aligned_bboxes = extract_bbox(aligned_mesh_vertices, object_id_to_segs,
Expand Down Expand Up @@ -221,7 +221,7 @@ scannet
├── scannet_infos_test.pkl
```

- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3d detection task takes axis-aligned point clouds as input, while ScanNet 3d semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3d detection task.
- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3D detection task takes axis-aligned point clouds as input, while ScanNet 3D semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3D detection task.
- `instance_mask/xxxxx.bin`: The instance label for each point, value range: [0, NUM_INSTANCES], 0: unannotated.
- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: [1, 40], i.e. `nyu40id` standard. Note: the `nyu40id` id will be mapped to train id in train pipeline `PointSegClassMapping`.
- `posed_images/scenexxxx_xx`: The set of `.jpg` images with `.txt` 4x4 poses and the single `.txt` file with camera intrinsic matrix.
Expand All @@ -231,21 +231,21 @@ scannet
- info['pts_instance_mask_path']: The path of `instance_mask/xxxxx.bin`.
- info['pts_semantic_mask_path']: The path of `semantic_mask/xxxxx.bin`.
- info['annos']: The annotations of each scan.
- annotations['gt_num']: The number of ground truth.
- annotations['gt_num']: The number of ground truths.
- annotations['name']: The semantic name of all ground truths, e.g. `chair`.
- annotations['location']: The gravity center of axis-aligned 3d bounding box. Shape: [K, 3], K is the number of ground truth.
- annotations['dimensions']: The dimensions of axis-aligned 3d bounding box, i.e. x_size, y_size, z_size, shape: [K, 3].
- annotations['gt_boxes_upright_depth']: Axis-aligned 3d bounding box, each bounding box is x, y, z, x_size, y_size, z_size, shape: [K, 6].
- annotations['unaligned_location']: The gravity center of axis-unaligned 3d bounding box.
- annotations['unaligned_dimensions']: The dimensions of axis-unaligned 3d bounding box.
- annotations['unaligned_gt_boxes_upright_depth']: Axis-unaligned 3d bounding box.
- annotations['location']: The gravity center of the axis-aligned 3D bounding boxes. Shape: [K, 3], K is the number of ground truths.
- annotations['dimensions']: The dimensions of the axis-aligned 3D bounding boxes, i.e. (x_size, y_size, z_size), shape: [K, 3].
- annotations['gt_boxes_upright_depth']: The axis-aligned 3D bounding boxes, each bounding box is (x, y, z, x_size, y_size, z_size), shape: [K, 6].
- annotations['unaligned_location']: The gravity center of the axis-unaligned 3D bounding boxes.
- annotations['unaligned_dimensions']: The dimensions of the axis-unaligned 3D bounding boxes.
- annotations['unaligned_gt_boxes_upright_depth']: The axis-unaligned 3D bounding boxes.
- annotations['index']: The index of all ground truths, i.e. [0, K).
- annotations['class']: The train class id of each bounding box, value range: [0, 18), shape: [K, ].
- annotations['class']: The train class id of the bounding boxes, value range: [0, 18), shape: [K, ].


## Training pipeline

A typical training pipeline of ScanNet for 3d detection is as below.
A typical training pipeline of ScanNet for 3D detection is as follows.

```python
train_pipeline = [
Expand Down Expand Up @@ -291,12 +291,12 @@ train_pipeline = [
- `GlobalAlignment`: The previous point cloud would be axis-aligned using the axis-aligned matrix.
- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like [0, 18) during training.
- Data augmentation:
- `IndoorPointSample`: downsample input point cloud.
- `RandomFlip3D`: randomly flip input point cloud horizontally or vertically.
- `GlobalRotScaleTrans`: rotate input point cloud, usually [-5, 5] degree.
- `IndoorPointSample`: downsample the input point cloud.
- `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of [-5, 5] (degrees) for ScanNet; then scale the input point cloud, usually by 1.0 for ScanNet; finally translate the input point cloud, usually by 0 for ScanNet.

## Metrics

Typically mean average precision (mAP) is used for evaluation on ScanNet, e.g. `mAP@0.25` and `mAP@0.5`. In detail, a generic functions to compute precision and recall for 3d object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/indoor_eval.py).
Typically mean Average Precision (mAP) is used for evaluation on ScanNet, e.g. `mAP@0.25` and `mAP@0.5`. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3D/core/evaluation/indoor_eval.py).

As introduced in section `Export ScanNet data`, all ground truth 3d bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3d bounding box is also zero and axis-aligned 3d non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation.
As introduced in section `Export ScanNet data`, all ground truth 3D bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3D bounding box is also zero and axis-aligned 3D non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation.
Loading