Back | Next | Contents
Transfer Learning - Object Detection
Next, we'll train our own SSD-Mobilenet object detection model using PyTorch and the Open Images dataset. SSD-Mobilenet is a popular network architecture for realtime object detection on mobile and embedded devices that combines the SSD-300 Single-Shot MultiBox Detector with a Mobilenet backbone.
In the example below, we'll train a custom detection model that locates 8 different varieties of fruit, although you are welcome to pick from any of the 600 classes in the Open Images dataset to train your model on. You can visually browse the dataset here.
To get started, first make sure that you have JetPack 4.4 or newer and PyTorch installed for Python 3.6 on your Jetson. JetPack 4.4 includes TensorRT 7.1, which is the minimum TensorRT version that supports loading SSD-Mobilenet via ONNX. And the PyTorch training scripts used for training SSD-Mobilenet are for Python3, so PyTorch should be installed for Python 3.6.
note: first make sure that you have JetPack 4.4 or newer on your Jetson and PyTorch installed for Python 3.6
The PyTorch code for training SSD-Mobilenet is found in the repo under jetson-inference/python/training/detection/ssd
. If you aren't Running the Docker Container, there are a couple steps required before using it:
# you only need to run these if you aren't using the container
$ cd jetson-inference/python/training/detection/ssd
$ wget https://nvidia.box.com/shared/static/djf5w54rjvpqocsiztzaandq1m3avr7c.pth -O models/mobilenet-v1-ssd-mp-0_675.pth
$ pip3 install -v -r requirements.txt
This will download the base model to ssd/models
and install some required Python packages (these were already installed into the container). The base model was already pre-trained on a different dataset (PASCAL VOC) so that we don't need to train SSD-Mobilenet from scratch, which would take much longer. Instead we'll use transfer learning to fine-tune it to detect new object classes of our choosing.
The Open Images dataset contains over 600 object classes that you can pick and choose from. There is a script provided called open_images_downloader.py
which will automatically download the desired object classes for you.
note: the fewer classes used, the faster the model will run during inferencing. Open Images can also contain hundreds of gigabytes of data depending on the classes you pick - so before downloading your own classes, see the Limiting the Amount of Data section below.
The classes that we'll be using are "Apple,Orange,Banana,Strawberry,Grape,Pear,Pineapple,Watermelon"
, for example for a fruit-picking robot - although you are welcome to substitute your own choices from the class list. The fruit classes have ~6500 images, which is a happy medium.
$ python3 open_images_downloader.py --class-names "Apple,Orange,Banana,Strawberry,Grape,Pear,Pineapple,Watermelon" --data=data/fruit
...
2020-07-09 16:20:42 - Starting to download 6360 images.
2020-07-09 16:20:42 - Downloaded 100 images.
2020-07-09 16:20:42 - Downloaded 200 images.
2020-07-09 16:20:42 - Downloaded 300 images.
2020-07-09 16:20:42 - Downloaded 400 images.
2020-07-09 16:20:42 - Downloaded 500 images.
2020-07-09 16:20:46 - Downloaded 600 images.
...
2020-07-09 16:32:12 - Task Done.
By default, the dataset will be downloaded to the data/
directory under jetson-inference/python/training/detection/ssd
(which is automatically mounted into the container), but you can change that by specifying the --data=<PATH>
option. Depending on the size of your dataset, it may be necessary to use external storage. And if you download multiple datasets, you should store each dataset in their own subdirectory.
Depending on the classes that you select, Open Images can contain lots of data - in some cases too much to be trained in a reasonable amount of time for our purposes. In particular, the classes containing people and vehicles have a very large amount of images (>250GB).
So when selecting your own classes, before downloading the data it's recommended to first run the downloader script with the --stats-only
option. This will show how many images there are for your classes, without actually downloading any images.
$ python3 open_images_downloader.py --stats-only --class-names "Apple,Orange,Banana,Strawberry,Grape,Pear,Pineapple,Watermelon" --data=data/fruit
...
2020-07-09 16:18:06 - Total available images: 6360
2020-07-09 16:18:06 - Total available boxes: 27188
-------------------------------------
'train' set statistics
-------------------------------------
Image count: 5145
Bounding box count: 23539
Bounding box distribution:
Strawberry: 7553/23539 = 0.32
Orange: 6186/23539 = 0.26
Apple: 3622/23539 = 0.15
Grape: 2560/23539 = 0.11
Banana: 1574/23539 = 0.07
Pear: 757/23539 = 0.03
Watermelon: 753/23539 = 0.03
Pineapple: 534/23539 = 0.02
...
-------------------------------------
Overall statistics
-------------------------------------
Image count: 6360
Bounding box count: 27188
note:
--stats-only
does download the annotation data (approximately ~1GB), but not the images yet.
In practice, to keep the training time down (and disk space), you probably want to keep the total number of images <10K. Although the more images you use, the more accurate your model will be. You can limit the amount of data downloaded with the --max-images
option or the --max-annotations-per-class
options:
--max-images
limits the total dataset to the specified number of images, while keeping the distribution of images per class roughly the same as the original dataset. If one class has more images than another, the ratio will remain roughly the same.--max-annotations-per-class
limits each class to the specified number of bounding boxes, and if a class has less than that number available, all of it's data will be used - this is useful if the distribution of data is unbalanced across classes.
For example, if you wanted to only use 2500 images for the fruit dataset, you would launch the downloader like this:
$ python3 open_images_downloader.py --max-images=2500 --class-names "Apple,Orange,Banana,Strawberry,Grape,Pear,Pineapple,Watermelon" --data=data/fruit
If the --max-boxes
option or --max-annotations-per-class
isn't set, by default all the data available will be downloaded - so beforehand, be sure to check the amount of data first with --stats-only
. Unfortunately it isn't possible in advance to determine the actual disk size requirements of the images, but a general rule of thumb for this dataset is to budget ~350KB per image (~2GB for the fruits).
Below is approximate SSD-Mobilenet training performance to help estimate the time required for training:
Images/sec | Time per epoch* | |
---|---|---|
Nano | 4.77 | 17 min 55 sec |
Xavier NX | 14.65 | 5 min 50 sec |
- measured on the fruits dataset (5145 training images, batch size 4)
Once your data has finished downloading, run the train_ssd.py
script to launch the training:
python3 train_ssd.py --data=data/fruit --model-dir=models/fruit --batch-size=4 --epochs=30
note: if you run out of memory or your process is "killed" during training, try Mounting SWAP and Disabling the Desktop GUI.
to save memory, you can also reduce the--batch-size
(default 4) and--workers
(default 2)
Here are some common options that you can run the training script with:
Argument | Default | Description |
---|---|---|
--data |
data/ |
the location of the dataset |
--model-dir |
models/ |
directory to output the trained model checkpoints |
--resume |
None | path to an existing checkpoint to resume training from |
--batch-size |
4 | try increasing depending on available memory |
--epochs |
30 | up to 100 is desirable, but will increase training time |
--workers |
2 | number of data loader threads (0 = disable multithreading) |
Over time, you should see the loss decreasing:
2020-07-10 13:14:12 - Epoch: 0, Step: 10/1287, Avg Loss: 12.4240, Avg Regression Loss 3.5747, Avg Classification Loss: 8.8493
2020-07-10 13:14:12 - Epoch: 0, Step: 20/1287, Avg Loss: 9.6947, Avg Regression Loss 4.1911, Avg Classification Loss: 5.5036
2020-07-10 13:14:13 - Epoch: 0, Step: 30/1287, Avg Loss: 8.7409, Avg Regression Loss 3.4078, Avg Classification Loss: 5.3332
2020-07-10 13:14:13 - Epoch: 0, Step: 40/1287, Avg Loss: 7.3736, Avg Regression Loss 2.5356, Avg Classification Loss: 4.8379
2020-07-10 13:14:14 - Epoch: 0, Step: 50/1287, Avg Loss: 6.3461, Avg Regression Loss 2.2286, Avg Classification Loss: 4.1175
...
2020-07-10 13:19:26 - Epoch: 0, Validation Loss: 5.6730, Validation Regression Loss 1.7096, Validation Classification Loss: 3.9634
2020-07-10 13:19:26 - Saved model models/fruit/mb1-ssd-Epoch-0-Loss-5.672993580500285.pth
If you want to test your model before the full number of epochs have completed training, you can press Ctrl+C
to kill the training script, and resume it again later with the --resume=<CHECKPOINT>
argument. You can download the fruit model that was already trained for 100 epochs here.
Next we need to convert our trained model from PyTorch to ONNX, so that we can load it with TensorRT:
python3 onnx_export.py --model-dir=models/fruit
This will save a model called ssd-mobilenet.onnx
under jetson-inference/python/training/detection/ssd/models/fruit/
To classify some static test images, we'll use the extended command-line parameters to detectnet
(or detectnet.py
) to load our custom SSD-Mobilenet ONNX model. To run these commands, the working directory of your terminal should still be located in: jetson-inference/python/training/detection/ssd/
IMAGES=<path-to-your-jetson-inference>/data/images # substitute your jetson-inference path here
detectnet --model=models/fruit/ssd-mobilenet.onnx --labels=models/fruit/labels.txt \
--input-blob=input_0 --output-cvg=scores --output-bbox=boxes \
"$IMAGES/fruit_*.jpg" $IMAGES/test/fruit_%i.jpg
note:
detectnet.py
can be substituted above to run the Python version of the program
Below are some of the images output to the $IMAGES/test
directory:
You can also try running your re-trained plant model on a camera or video stream like below:
detectnet --model=models/fruit/ssd-mobilenet.onnx --labels=models/fruit/labels.txt \
--input-blob=input_0 --output-cvg=scores --output-bbox=boxes \
csi://0
For more details about other camera/video sources, please see Camera Streaming and Multimedia.
Next | Collecting your own Detection Datasets
Back | Collecting your own Classification Datasets
© 2016-2020 NVIDIA | Table of Contents