@@ -12,7 +12,7 @@ Database for Pedestrian Detection and
1212Segmentation <https://www.cis.upenn.edu/~jshi/ped_html/> `__. It contains
1313170 images with 345 instances of pedestrians, and we will use it to
1414illustrate how to use the new features in torchvision in order to train
15- an object detection model on a custom dataset.
15+ an object detection and instance segmentation model on a custom dataset.
1616
1717Defining the Dataset
1818--------------------
@@ -26,22 +26,23 @@ adding new custom datasets. The dataset should inherit from the standard
2626The only specificity that we require is that the dataset ``__getitem__ ``
2727should return a tuple:
2828
29- - image: `` torchvision.datapoints.Image[3, H, W] `` or a PIL Image of size ``(H, W) ``
29+ - image: :class: ` torchvision.datapoints.Image ` of shape `` [3, H, W] `` or a PIL Image of size ``(H, W) ``
3030- target: a dict containing the following fields
3131
32- - ``boxes ( torchvision.datapoints.BoundingBoxes[N, 4]) ``: the coordinates of the ``N ``
33- bounding boxes in ``[x0, y0, x1, y1] `` format, ranging from ``0 ``
32+ - ``boxes ``, :class: ` torchvision.datapoints.BoundingBoxes ` of shape ``[N, 4] ``:
33+ the coordinates of the `` N `` bounding boxes in ``[x0, y0, x1, y1] `` format, ranging from ``0 ``
3434 to ``W `` and ``0 `` to ``H ``
35- - ``labels (Int64Tensor[N]) ``: the label for each bounding box. ``0 `` represents always the background class.
36- - ``image_id (int) ``: an image identifier. It should be
35+ - ``labels ``, integer :class: `torch.Tensor ` of shape ``[N] ``: the label for each bounding box.
36+ ``0 `` represents always the background class.
37+ - ``image_id ``, int: an image identifier. It should be
3738 unique between all the images in the dataset, and is used during
3839 evaluation
39- - ``area (Float32Tensor [N]) ``: The area of the bounding box. This is used
40+ - ``area ``, float :class: ` torch.Tensor ` of shape `` [N] ``: the area of the bounding box. This is used
4041 during evaluation with the COCO metric, to separate the metric
4142 scores between small, medium and large boxes.
42- - ``iscrowd (UInt8Tensor [N]) ``: instances with iscrowd=True will be
43+ - ``iscrowd ``, uint8 :class: ` torch.Tensor ` of shape `` [N] ``: instances with iscrowd=True will be
4344 ignored during evaluation.
44- - (optionally) ``masks ( torchvision.datapoints.Mask[N, H, W]) ``: The segmentation
45+ - (optionally) ``masks ``, :class: ` torchvision.datapoints.Mask ` of shape `` [N, H, W] ``: the segmentation
4546 masks for each one of the objects
4647
4748If your dataset is compliant with above requirements then it will work for both
@@ -97,12 +98,16 @@ Here is one example of a pair of images and segmentation masks
9798
9899So each image has a corresponding
99100segmentation mask, where each color correspond to a different instance.
100- Let’s write a `` torch.utils.data.Dataset ` ` class for this dataset.
101+ Let’s write a :class: ` torch.utils.data.Dataset ` class for this dataset.
101102In the code below, we are wrapping images, bounding boxes and masks into
102- ``torchvision.datapoints `` structures so that we will be able to apply torchvision
103+ ``torchvision.datapoints `` classes so that we will be able to apply torchvision
103104built-in transformations (`new Transforms API <https://pytorch.org/vision/stable/transforms.html >`_)
104- that cover the object detection and segmentation tasks.
105- For more information about torchvision datapoints see `this documentation <https://pytorch.org/vision/stable/datapoints.html >`_.
105+ for the given object detection and segmentation task.
106+ Namely, image tensors will be wrapped by :class: `torchvision.datapoints.Image `, bounding boxes into
107+ :class: `torchvision.datapoints.BoundingBoxes ` and masks into :class: `torchvision.datapoints.Mask `.
108+ As datapoints are :class: `torch.Tensor ` subclasses, wrapped objects are also tensors and inherit plain
109+ :class: `torch.Tensor ` API. For more information about torchvision datapoints see
110+ `this documentation <https://pytorch.org/vision/main/auto_examples/v2_transforms/plot_transforms_v2.html#sphx-glr-auto-examples-v2-transforms-plot-transforms-v2-py >`_.
106111
107112.. code :: python
108113
@@ -264,8 +269,8 @@ way of doing it:
264269 rpn_anchor_generator = anchor_generator,
265270 box_roi_pool = roi_pooler)
266271
267- Object detection model for PennFudan Dataset
268- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
272+ Object detection and instance segmentation model for PennFudan Dataset
273+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
269274
270275In our case, we want to finetune from a pre-trained model, given that
271276our dataset is very small, so we will be following approach number 1.
0 commit comments