This is the set of tools and configurations used by the YOLO Real-Time Food Detection article at http://bennycheung.github.io/yolo-for-real-time-food-detection
The following instructions concentrated on describing YOLO v2 setup and training. To get DarkNet YOLO training to work, we needs
- Object bounding box file for each image
- Class name file for all the category names
- Training dataset file for the list of training images
- Validating dataset file for the list of validating images
- Configuration file for YOLO neural network specification
- Data location file for finding all the data information
Here is my github <> of tools to help, check them out if in doubt! It also contains my configuration .cfg
, class name .names
and data location .data
files described later.
They are designed or modified to working with DarkNet requirement for bounding box and training data.
food100_generate_bbox_file.py
- python script to take each of the Food100 class image directorybb_info.txt
and create individual bbox files.food100_split_for_yolo.py
- python script to split all Food100 class images into (1)train.txt
training image list and (2)test.txt
validating image list.food100_tk_label_bbox.py
- python script using UI to help drawing bbox associating with the images.
After downloaded and unpacked the Food100 dataset UEC FOOD 100, it requires post processing to make bounding box that fit into DarkNet's YOLO training requirements.
The Food100 classes are like,
food_id | name |
---|---|
1 | rice |
2 | eels_on_rice |
3 | pilaf |
4 | chicken-n-egg_on_rice |
5 | pork_cutlet_on_rice |
6 | beef_curry |
7 | sushi |
... | up to 100 classes |
DarkNet YOLO (both v2 and v3) expected the class number is 0 based, that the food_id-1 to get the class number
<object-class-id> <center-X> <center-Y> <width> <height>
Originally, I forgot to subtract 1 for the ; that's make me think the method is not working and almost gives up! After re-align to 0 based , the detection shows correct results.
DarkNet YOLO expected a bounding box .txt
-file for each .jpg
-image-file - in the same directory and with the same name, but with .txt
-extension, and put to file: object number and object coordinates on this image.
If we named our food100 image directory as images
, then DarkNet will automatically look for the corresponding .txt
in labels
directory. For example, images/1/2.jpg
will look for corresponding label in labels/1/2.txt
file. I like this approach because we can keep the editing bbox in images/1/2.txt
, while the labels/1/2.txt
is the bbox format required for YOLO training.
For each food class directory, there is a bb_info.txt
that contains all the bbox for every image files. The original bbox is specified as,
<image-number> <top-left-X> <top-left-Y> <bottom-right-X> <bottom-right-Y>
However, YOLO expected each .jpg
-image-file has a corresponding bbox description .txt
-extension file. The bbox description file is specified as,
<object-class-id> <center-X> <center-Y> <width> <height>
- - integer number of object from 0 to (food_id-1)
- - float values relative to width and height of image, it can be equal from (0.0 to 1.0]
- for example: = <absolute_x> / <image_width> or = <absolute_height> / <image_height>
- attention: - are center of bounding box (are not top-left corner)
The data file to tell where to find the training and validation paths, food100.data
classes = 100
train = /Users/bcheung/dev/ML/darknet/data/food100/train.txt
valid = /Users/bcheung/dev/ML/darknet/data/food100/test.txt
names = /Users/bcheung/dev/ML/darknet/data/food100/food100.names
backup = backup
- classes = 100 is the number of classes, should equal to line counts of class name file
- train = is the file list of all training images, one image per line
- valid = is the file list of all validating images, one image per line
- names = is the file that list class names
- backup = is where the intermediate
.weights
and final.weights
files will be written
Obviously, change the data to your specific data locations.
The train.txt
file list of all training images, one image per line. For example,
/Users/bcheung/dev/ML/darknet/data/food100/images/61/6164.jpg
/Users/bcheung/dev/ML/darknet/data/food100/images/61/6170.jpg
/Users/bcheung/dev/ML/darknet/data/food100/images/61/6158.jpg
...
There should be no image coming from the validating dataset.
The test.txt
file list of all validating images, one image per line. For example,
/Users/bcheung/dev/ML/darknet/data/food100/images/61/6990.jpg
/Users/bcheung/dev/ML/darknet/data/food100/images/61/6149.jpg
/Users/bcheung/dev/ML/darknet/data/food100/images/61/6099.jpg
...
There should be no image coming from the training dataset.
The class names specified in, food100.names
Subsequently, the class id is 0-based, i.e. rice
is class 0
.
There is no class-id, just names.
rice
eels-on-rice
pilaf
chicken-n-egg-on-rice
pork-cutlet-on rice
beef-curry
sushi
...
We need to create a configuration file yolov2-food100.cfg
.
You can copy darknet
cfg/yolov2-voc.cfg
and make modifications
[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=8
...
The meaning of batch and subdivisions is how many mini batches you split your batch in.
- batch=64 -> loading 64 images for this "iteration".
- subdivision=8 -> Split batch into 8 "mini-batches" so 64/8 = 8 images per "minibatch" and this get sent to the GPU for process.
That will be repeated 8 times until the batch is completed and a new itereation will start with 64 new images. When batching you are averaging over more images the intend is not only to speed up the training process but also to generalise the training more. If your GPU have enough memory you can reduce the subdivision to load more images into the gpu to process at the same time.
Further editing to the configuration file classes
and filters
specifications,
# number of filters calculated by (#-of-classes + 5)*5
# e.g. (100 + 5)*5 = 525
edit to line 237: filters=525
edit to line 244: classes=100