Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RECTANGULAR INFERENCE #232

Closed
glenn-jocher opened this issue Apr 22, 2019 · 56 comments
Closed

RECTANGULAR INFERENCE #232

glenn-jocher opened this issue Apr 22, 2019 · 56 comments
Assignees
Labels
Stale Stale and schedule for closing soon

Comments

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 22, 2019

Rectangular inference is implemented by default in detect.py. This reduces inference time proportionally to the amount of letterboxed area padded onto a square image vs a 32-minimum multiple rectangular image. On zidane.jpg, for example, CPU inference time (on a 2018 MacBook Pro) reduces from 1.01s to 0.63s, a 37% reduction, corresponding to a 38% reduction in image area (416x416 to 256x416).

Square Inference

Letterboxes to 416x416 squares.

python3 detect.py  # 416 square inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x416 1 handbags, 3 persons, 1 buss, Done. (0.999s)
image 2/2 data/samples/zidane.jpg: 416x416 1 ties, 2 persons, Done. (1.008s)

Rectangular Inference

Letterboxes to 416 along longest image dimension, pads shorter dimension to minimum multiple of 32.

python3 detect.py  # 416 rectangular inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.767s)
image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.632s)
zidane.jpg bus.jpg
416x416 416x416
256x416 416x320
1280 × 720 810 × 1080
@glenn-jocher
Copy link
Member Author

glenn-jocher commented Apr 24, 2019

Rectangular training example in the works, first batch of COCO. This is a bit complicated as we need to letterbox all images in the batch to the same size, and some of the images are being pulled simultaneously by parallel dataloader workers. So part of this process is determining apriori the batch index that each image belongs to (shuffle=False now), and then letterboxing it to the minimum viable 32 mulitple for the most square image in that batch. This should be included in our upcoming v7 release, with enormous training speed improvements (about 1/3 faster on mixed aspect ratio datasets like COCO).

train_batch0

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Apr 29, 2019

Rectangular training results on coco_100img.data. Speedup was not material in this case because CUDA was constantly optimizing at each batch due to benchmark set to True, and the dataset of 100 images had only 6 batches, each with a different shape. Speedup should be more impactful on larger training sets. Individual batches were timed as fast as 0.189 seconds here vs 0.240 seconds for 416 square training using a V100.

torch.backends.cudnn.benchmark = True # unsuitable for multiscale

results

Rectangular training can be accessed here:

yolov3/utils/datasets.py

Lines 146 to 148 in 7e6e189

# Rectangular Training https://github.com/ultralytics/yolov3/issues/232
self.train_rectangular = False
if self.train_rectangular:

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Apr 29, 2019

Rectangular inference is now working in our latest iDetection iOS App build! This is a screenshot recorded today at 192x320, inference on vertical 4k format 16:9 aspect ratio iPhone video. This pushes the performance to realtime 30 FPS!! This means that we now have YOLOv3-SPP running in realtime on an iPhone Xs using rectangular inference! This is a worldwide first as far as we know.

@dakdouky
Copy link

dakdouky commented Dec 5, 2019

Hi @glenn-jocher,

I'm trying rectangular training with rect=True but the tensors during training are all square starting with the input torch.Size([16, 3, 416, 416]), what could be the problem?

I'd expect the shapes to be the nearest multiples of 32 for both image dimensions.

What should be img_size in the line:
self.batch_shapes = np.ceil(np.array(shapes) * img_size / 32.).astype(np.int) * 32

I also noticed that images look rectangular in the test_batch.jpg but square in train_batch.jpg, does this mean that rectangular training is unsupported?

@glenn-jocher
Copy link
Member Author

@MOHAMEDELDAKDOUKY training uses a mosaic loader, which loads 4 images at a time into a mosaic. You can disable this on this line:

mosaic = True and self.augment # load 4 images at a time into a mosaic (only during training)

@dakdouky
Copy link

dakdouky commented Dec 5, 2019

@MOHAMEDELDAKDOUKY training uses a mosaic loader, which loads 4 images at a time into a mosaic. You can disable this on this line:

mosaic = True and self.augment # load 4 images at a time into a mosaic (only during training)

Yes, I disabled it but the images are still squares of 416x416.

@glenn-jocher
Copy link
Member Author

@MOHAMEDELDAKDOUKY your repo may be out of date. git clone a new version and try again.

@dakdouky
Copy link

dakdouky commented Dec 5, 2019

@MOHAMEDELDAKDOUKY your repo may be out of date. git clone a new version and try again.

Done, still getting this with rect=True, mosaic and augmentation disabled.
train_batch0

@glenn-jocher
Copy link
Member Author

@MOHAMEDELDAKDOUKY test.py's dataloader operates rectangular inference. Use the same settings in train.py.

@dakdouky
Copy link

dakdouky commented Dec 7, 2019

@MOHAMEDELDAKDOUKY test.py's dataloader operates rectangular inference. Use the same settings in train.py.

Well, the issue was that I used a batch_size=16 which is the whole 16 images set. Images were padded to the same size of the clock image in the third row which is square.

Thanks for your reply!

@glenn-jocher
Copy link
Member Author

@MOHAMEDELDAKDOUKY ah of course. The batch is padded to the minimum rectangle of the entire group of images, so one square image may cause the batch to be square. Rectangular dataloading is also always in the same order, as the images are loaded in increasing aspect ratio.

@mozpp
Copy link

mozpp commented Jan 22, 2020

It seems that letterbox has computed ratio and padding, and scale_coords compute them again. Will it speed up if compute one time?

def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None):

@glenn-jocher
Copy link
Member Author

@mozpp yes, the intention is that if ratio_pad is not passed to the function then the padding is computed automatically based on the same assumptions set forth when padding the image originally. Some speedup might be realized by passing this precomputed value, but in profiling this is not a significant hotspot.

@chouxianyu
Copy link

If I use this project to convert yolov3 or yolo-spp models to onnx, does the transferred onnx support rectangular inference?
@glenn-jocher Waiting for your early reply!

@glenn-jocher
Copy link
Member Author

@chouxianyu yes. iDetection on iOS runs with rectangular inference using the PyTorch > ONNX > CoreML export pipeline.

@feixiangdekaka
Copy link

Rectangular inference is implemented by default in detect.py. This reduces inference time proportionally to the amount of letterboxed area padded onto a square image vs a 32-minimum multiple rectangular image. On zidane.jpg, for example, CPU inference time (on a 2018 MacBook Pro) reduces from 1.01s to 0.63s, a 37% reduction, corresponding to a 38% reduction in image area (416x416 to 256x416).

Square Inference

Letterboxes to 416x416 squares.

python3 detect.py  # 416 square inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x416 1 handbags, 3 persons, 1 buss, Done. (0.999s)
image 2/2 data/samples/zidane.jpg: 416x416 1 ties, 2 persons, Done. (1.008s)

Rectangular Inference

Letterboxes to 416 along longest image dimension, pads shorter dimension to minimum multiple of 32.

python3 detect.py  # 416 rectangular inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.767s)
image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.632s)

zidane.jpg bus.jpg
416x416 416x416
256x416 416x320
1280 × 720 810 × 1080

def letterbox(img, new_shape=(416, 416), color=(114, 114, 114), auto=False, scaleFill=False, scaleup=True):

416*416 inference is worse than auto=True ?

@glenn-jocher
Copy link
Member Author

@feixiangdekaka I don't understand your question.

@WZMIAOMIAO
Copy link

hello,
I have a question. why use this color (114, 114, 114) to fill border rather then black (0, 0, 0)?

@glenn-jocher
Copy link
Member Author

@WZMIAOMIAO imagenet mean.

@WZMIAOMIAO
Copy link

@glenn-jocher Isn't the imagenet mean [123.68, 116.78, 103.94]?

@glenn-jocher
Copy link
Member Author

@WZMIAOMIAO sure, sum those numbers and divide by 3. We use this because some functions prepopulate with a scalar rather than a vector.

@WZMIAOMIAO
Copy link

sorry, i can't understand this meaning. what if i use [0, 0, 0] to fill border?

@HYUNKYEONG
Copy link

Hello, thank you for the good writing.
I learned --img 800800 at train.py,
Is there any way to detect --img 800
600 at detect.py?
I got " WARNING: --img-size [800, 600] must be multiple of max stride 32, updating to [800, 608] "
I want the detected image size to be 800*600!

@glenn-jocher
Copy link
Member Author

@HYUNKYEONG specify long side, short side is resolved automatically, i.e. detect.py --img 800

@HYUNKYEONG
Copy link

Thank you for your answer, I would like to detect the image size 800*600 equally.
I did train.py --img 800, detect.py --img 800 600, and I got "image at shape (1, 3, 800, 608)"
I want the shape (1,3,800,600). Is it impossible for the short side to meet a multiple of 8 instead of a multiple of 32?

@glenn-jocher
Copy link
Member Author

@HYUNKYEONG YOLOv5 P5 models have minimum stride constraints of 32, P6 models require minimum stride of 64.

@autograd500
Copy link

Rectangular training example in the works, first batch of COCO. This is a bit complicated as we need to letterbox all images in the batch to the same size, and some of the images are being pulled simultaneously by parallel dataloader workers. So part of this process is determining apriori the batch index that each image belongs to (shuffle=False now), and then letterboxing it to the minimum viable 32 mulitple for the most square image in that batch. This should be included in our upcoming v7 release, with enormous training speed improvements (about 1/3 faster on mixed aspect ratio datasets like COCO).

train_batch0

I'm very confused about rectanglar training. In the yolov5/utils/dataloaders.py Line545-568, it fills the images in the same batch into a square shape.Why is it called rectangular training? What do the rectangular images above show?

Thank you in advance.

@glenn-jocher
Copy link
Member Author

@autograd500 rectangular training refers to the process of letterboxing images in a batch to a common size with a minimum viable multiple of 32 for the most square image. The term "rectangular" here is used to indicate that the images in the batch may have different dimensions, resulting in a rectangular shape after letterboxing. The images shown in the example demonstrate this process, where each image is letterboxed to the same size within the batch. This approach is used to optimize training speed, especially for datasets with mixed aspect ratios like COCO. I hope this clarifies the concept for you. Let me know if you have any further questions.

@autograd500
Copy link

autograd500 commented Oct 18, 2023

Thanks for the answer, I still have the following questions:

# Rectangular Training
        if self.rect:
            # Sort by aspect ratio
            s = self.shapes  # wh
            ar = s[:, 1] / s[:, 0]  # aspect ratio
            irect = ar.argsort()
            self.im_files = [self.im_files[i] for i in irect]
            self.label_files = [self.label_files[i] for i in irect]
            self.labels = [self.labels[i] for i in irect]
            self.segments = [self.segments[i] for i in irect]
            self.shapes = s[irect]  # wh
            ar = ar[irect]

            # Set training image shapes
            shapes = [[1, 1]] * nb
            for i in range(nb):
                ari = ar[bi == i]
                mini, maxi = ari.min(), ari.max()
                if maxi < 1:
                    shapes[i] = [maxi, 1]
                elif mini > 1:
                    shapes[i] = [1, 1 / mini]

            self.batch_shapes = np.ceil(np.array(shapes) * img_size / stride + pad).astype(int) * stride

When maxi < 1, the images s[1]/s[0] <1 within the batch . But the final image size is [img_size * maxi, img_size], where s[0] < s[1]. So, the aspect ratio of images changed after letterboxing?

@glenn-jocher
Copy link
Member Author

@autograd500 yes, in rectangular training, the aspect ratio of the images within the batch can be adjusted during the process of letterboxing. The code you shared is responsible for setting the training image shapes based on the aspect ratios of the images.

When maxi < 1, it means that the width of the image (s[0]) is greater than the height (s[1]), resulting in an aspect ratio less than 1. In this case, the code sets the image shape to [maxi * img_size, img_size], which means that the width will be scaled down and the height will remain the same after letterboxing.

So, to answer your question, yes, the aspect ratio of the images can change after letterboxing to achieve a consistent shape within the batch. Let me know if you have any further questions.

@autograd500
Copy link

To achieve a consistent shape within the batch, it can also set the image shape to [img_size, maxi * img_size]. In this case, the aspect ratio of the images can be consistent. I intuitively feel that such process is better, because the proportions of the images are not broken.

Why not set the image shape to [img_size, maxi * img_size]?Doesn’t the aspect ratio of the images matter?

@glenn-jocher
Copy link
Member Author

@autograd500 thank you for your question and suggestion. The aspect ratio of the images does indeed matter in object detection tasks. When training models like YOLOv3, maintaining the original aspect ratio of the images can help preserve the proportions of objects in the scene.

The current approach of setting the image shape to [maxi * img_size, img_size] when maxi < 1 is aimed at ensuring a consistent shape within the batch while still allowing for some variation in aspect ratios. This approach strikes a balance between maintaining the proportions of objects and achieving a common size for efficient batch processing.

However, your idea of setting the image shape to [img_size, maxi * img_size] is interesting and worth considering. It could potentially provide a different trade-off between aspect ratio consistency and preserved object proportions. The choice between the two approaches may depend on the specific requirements and characteristics of the dataset being used.

Thank you for your contribution and for raising this point. It's valuable feedback that could be explored further in future enhancements. Let us know if you have any more questions or suggestions.

  • Glenn Jocher

@autograd500
Copy link

There is no questions for the time being, if there is, I will consult you again.

Thank you very much for your reply!

@glenn-jocher
Copy link
Member Author

@autograd500 hi there,

You're welcome! I'm glad I could help. If you have any more questions or need further assistance in the future, please don't hesitate to reach out. Have a great day!

  • Glenn Jocher

@Plus0591
Copy link

@glenn-jocher If I want to train a model with input images of size 512x288 and I want the model's input to be fixed, similar to 640x640, what should I do? Why does --rect cause each batch to have different widths and heights? Aren't the neural network inputs supposed to be of fixed size? Thank you.

@glenn-jocher
Copy link
Member Author

Hi there!

To train a model with a fixed input size of 512x288, you will need to modify the img_size in your training configuration to [512, 288] and deactivate the --rect training option. This setup will ensure that all your inputs are reshaped to 512x288 regardless of their original sizes.

The --rect training option allows for rectangular training, where each batch can adjust its shape according to the aspect ratios of the images within that batch. This is beneficial for mixed aspect ratio datasets, reducing padding and potentially speeding up training. However, the neural network still processes images of a consistent size within each batch.

If you require fixed dimensions for all inputs, simply setting img_size without the --rect option should address your needs. Here’s an example command:

python train.py --img 512 288 --batch-size 16 --data dataset.yaml --weights yolov3.pt

Hope this clears up your query! Let me know if there's anything else you'd like to discuss. 🌟

@Plus0591
Copy link

@glenn-jocher Sorry, it doesn't work.It shows
usage: train.py [-h] [--weights WEIGHTS] [--cfg CFG] [--data DATA] [--hyp HYP] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--imgsz IMGSZ] [--rect] [--resume [RESUME]] [--nosave] [--noval] [--noautoanchor] [--evolve [EVOLVE]] [--bucket BUCKET] [--cache [CACHE]] [--image-weights] [--device DEVICE]
[--multi-scale] [--single-cls] [--optimizer {SGD,Adam,AdamW}] [--sync-bn] [--workers WORKERS] [--project PROJECT] [--name NAME] [--exist-ok] [--quad] [--cos-lr] [--label-smoothing LABEL_SMOOTHING] [--patience PATIENCE] [--freeze FREEZE [FREEZE ...]] [--save-period SAVE_PERIOD]
[--local_rank LOCAL_RANK] [--entity ENTITY] [--upload_dataset [UPLOAD_DATASET]] [--bbox_interval BBOX_INTERVAL] [--artifact_alias ARTIFACT_ALIAS]
train.py: error: unrecognized arguments: 288

@Plus0591
Copy link

@glenn-jocher Excuse me, do you have any idea about this question

@glenn-jocher Sorry, it doesn't work.It shows usage: train.py [-h] [--weights WEIGHTS] [--cfg CFG] [--data DATA] [--hyp HYP] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--imgsz IMGSZ] [--rect] [--resume [RESUME]] [--nosave] [--noval] [--noautoanchor] [--evolve [EVOLVE]] [--bucket BUCKET] [--cache [CACHE]] [--image-weights] [--device DEVICE] [--multi-scale] [--single-cls] [--optimizer {SGD,Adam,AdamW}] [--sync-bn] [--workers WORKERS] [--project PROJECT] [--name NAME] [--exist-ok] [--quad] [--cos-lr] [--label-smoothing LABEL_SMOOTHING] [--patience PATIENCE] [--freeze FREEZE [FREEZE ...]] [--save-period SAVE_PERIOD] [--local_rank LOCAL_RANK] [--entity ENTITY] [--upload_dataset [UPLOAD_DATASET]] [--bbox_interval BBOX_INTERVAL] [--artifact_alias ARTIFACT_ALIAS] train.py: error: unrecognized arguments: 288

@glenn-jocher
Copy link
Member Author

@Chenplushao hey there!

It looks like you tried to specify separate width and height using --img 512 288, but train.py expects a single number for the --imgsz argument, which sets both the width and height to the same value.

If you need different dimensions and want them to be fixed, you'll have to modify the model configuration file and adjust the input dimensions directly there, as YOLO typically expects square inputs. The other option is to ensure your images are resized to be square while maintaining their aspect ratio through padding before training.

If you have any other questions or need further clarification, feel free to ask. Happy coding! 😊

@Plus0591
Copy link

@Chenplushao hey there!

It looks like you tried to specify separate width and height using --img 512 288, but train.py expects a single number for the --imgsz argument, which sets both the width and height to the same value.

If you need different dimensions and want them to be fixed, you'll have to modify the model configuration file and adjust the input dimensions directly there, as YOLO typically expects square inputs. The other option is to ensure your images are resized to be square while maintaining their aspect ratio through padding before training.

If you have any other questions or need further clarification, feel free to ask. Happy coding! 😊

Thank you sir!Have a nice day!

@glenn-jocher
Copy link
Member Author

@Chenplushao You're welcome, and thank you! If you need any more help down the line, don't hesitate to reach out. Have a fantastic day! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests