Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference with the pretrained and custom weights , the FPS are very low than given in README.md #2706

Closed
research-boy opened this issue Apr 5, 2021 · 16 comments
Labels
question Further information is requested Stale Stale and schedule for closing soon

Comments

@research-boy
Copy link

research-boy commented Apr 5, 2021

Tested Models - YOLOv5s , YOLOV5m
Tested on GPUs - NVIDIA 2070, NVIDIA Quadro 6000
Tested on OS - Ubuntu 20.04, WIndows 10
Inference Batch size : 1
Image size : 640

As per the table provided :

  • FPS of model YOLOv5s is 455 , for which i got 34
  • FPS of model YOLOv5m is 345 , for which i got 20

Would like to know the reason of this and let me know if anything i can change to get the same FPS as yours ?

@research-boy research-boy added the question Further information is requested label Apr 5, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Apr 5, 2021

👋 Hello @research-boy, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@research-boy instructions for reproducing table results are directly below table.

Screenshot 2021-04-05 at 20 48 24

@research-boy
Copy link
Author

@glenn-jocher Does that mean the we can reproduce the same fps on coco dataset and with batch size 32 ?

@glenn-jocher
Copy link
Member

@research-boy yes

@research-boy
Copy link
Author

research-boy commented Apr 6, 2021

@glenn-jocher would like to know in case of deployment , what will be the preferred parameter settings to get optimal fps?
And is there any way to use batch processing in class loadstreams , this for trying batch processing on webcam input?

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 6, 2021

@research-boy the default settings for best inference under most common use cases are already in place in detect.py and PyTorch Hub model inference. You may want to adapt these as necessary to your custom requirements.

yolov5/detect.py

Lines 149 to 168 in ec8979f

if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--weights', nargs='+', type=str, default='yolov5s.pt', help='model.pt path(s)')
parser.add_argument('--source', type=str, default='data/images', help='source') # file/folder, 0 for webcam
parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--view-img', action='store_true', help='display results')
parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
parser.add_argument('--nosave', action='store_true', help='do not save images/videos')
parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
parser.add_argument('--augment', action='store_true', help='augmented inference')
parser.add_argument('--update', action='store_true', help='update all models')
parser.add_argument('--project', default='runs/detect', help='save results to project/name')
parser.add_argument('--name', default='exp', help='save results to project/name')
parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
opt = parser.parse_args()

For PyTorch Hub inference see PyTorch Hub tutorial:

YOLOv5 Tutorials

@research-boy
Copy link
Author

@glenn-jocher is there any way to use batch processing in class loadstreams , this for trying batch processing on webcam input?

@glenn-jocher
Copy link
Member

@research-boy loadstreams dataloader automatically runs batched inference. If you supply 32 streams then your batch size is 32.

@vtyw
Copy link

vtyw commented Apr 7, 2021

@research-boy the default settings for best inference under most common use cases are already in place in detect.py and PyTorch Hub model inference. You may want to adapt these as necessary to your custom requirements.

@glenn-jocher What's the difference between using detect.py and hubconf.py?

I'm surprised that the Hub version seems to process a whole list of images as one batch, as opposed to dividing the images into batches of a fixed size. The point of using a fixed batch size is to find the sweet spot between inference time (which increases sub-linearly with batch size) versus the time to load each batch.

@glenn-jocher
Copy link
Member

@vtyw detect.py is a fully managed command-line YOLOv5 inference solution.

YOLOv5 PyTorch Hub models with autoShape() wrappers are python inference solutions suitable for integration into custom projects. For batch-size 1 inference with these models you can simply pass one image at a time.

@vtyw
Copy link

vtyw commented Apr 8, 2021

@glenn-jocher Here's what I collected as an example:

Using yolov5s inference on 640x640 images on a single RTX 2080:

Batch size Inference FPS FPS including output GPU Mem (MiB)
1 101 54 1155
2 177 73 1169
4 239 82 1239
8 252 88 1333
16 238 89 1573

From this I could form a conclusion such as: using a larger batch size improves FPS up to n = 8, and the increase in GPU memory is insignificant. Many object detection libraries offer this batch processing as a built-in feature, and that's why I was confused when batch processing is mentioned in the README and in some code comments but isn't actually a built-in feature as such.

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 8, 2021

@vtyw nice table! Batched inference is the automatic default when more than 1 image is passed for inference in our YOLOv5 PyTorch Hub solution:

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Images
dir = 'https://github.com/ultralytics/yolov5/raw/master/data/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()

Screen Shot 2021-04-08 at 11 21 16 AM

See PyTorch Hub tutorial for details:

YOLOv5 Tutorials

@zeyad-mansour
Copy link

Why is batched processing (multiple images at once) faster?

Is there a way to achieve that speed by processing images singularly (stream processing)?

@research-boy
Copy link
Author

research-boy commented May 1, 2021

@zeyad-mansour If you looking specific to deploy the model on an edge device try TensorRT with different precision, this should give better FPS.
About the batch processing here it's caching the images into memory then doing inference on that, while doing inference on live video , you will be grabbing frame by frame and loading into memory to do inference.

@zeyad-mansour
Copy link

@research-boy That's what I figured. Thanks for the explanation!

@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests

4 participants