Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv5 P6 Models 😃 #2110

Closed
glenn-jocher opened this issue Feb 1, 2021 · 14 comments
Closed

YOLOv5 P6 Models 😃 #2110

glenn-jocher opened this issue Feb 1, 2021 · 14 comments
Assignees
Labels
enhancement New feature or request Stale Stale and schedule for closing soon

Comments

@glenn-jocher
Copy link
Member

glenn-jocher commented Feb 1, 2021

We've done a bit of experimentation with adding an additional large object output layer P6, following the EfficientDet example of increasing output layers for larger models, except in our case applying it to all models. The current models have P3 (stride 8, small) to P5 (stride 32, large) outputs. The P6 output layer is stride 64 and intended for extra-large objects. It seems to help normal COCO training at --img 640, and we've also gotten good results training at --img 1280.

The architecture changes we've made to add a P6 layer are here. The backbone is extended down to P6, and the PANet head goes to down to P3 (like usual) and back up to P6 now instead of stopping at P5. New anchors were also added, evolved at --img 1280.
Screen Shot 2021-02-01 at 11 56 02 AM

The chart below shows the current YOLOv5l 4.0 model in blue, and the new YOLOv5l6 architecture in green and orange. The green and orange lines correspond to the same architecture but trained at --img 640 (green) or --img 1280 (orange). The points plotted are the evaluations of each model along an image space vector from 384 to 1536 in steps of 128, with results plotted up to the max mAP point. Code to reproduce is python test.py --task study following changes from PR ##2099.

study-yolov5l

Conclusion is that the P6 models increase performance on COCO under all scenarios we tested, though they are also a bit slower (maybe 10% slower), larger (50% larger), with only slightly more FLOPS though, so training time and CUDA memory requirements are almost the same as the P5 models. We are doing some more studies to see if these might be suitable for replacement of the current models. Let me know if you have any thoughts. For the time being, these models are available for auto-download just like the regular models, i.e. python detect.py --weights yolov5s6.pt.

@glenn-jocher glenn-jocher added the enhancement New feature or request label Feb 1, 2021
@glenn-jocher glenn-jocher self-assigned this Feb 1, 2021
@laurenssam
Copy link

laurenssam commented Feb 3, 2021

Hello Glenn,

I'm currently working with images with high resolution (8000 x 4000) and I don't want to downsample the images too far, since the model might not be able to detect objects far away. So, it would be super nice If we could use a pretrained model on high resolution, for example the 1280, you're talking about. Is it possible to download the weights of these models? I was also curious if you have models on the higher resolution without the additional output layer, I don't think I necessarily need that.

Thanks!

@WANGCHAO1996
Copy link

Hello Glenn, if the target of the data set is small, can I output only P2\P3\P4 How to modify the YAML file? Thank you very much!
@glenn-jocher

@glenn-jocher
Copy link
Member Author

@laurenssam yes the P6 models can be manually downloaded here:
https://github.com/ultralytics/yolov5/releases/tag/v4.0

Any model in the latest release assets can also be auto-downloaded simply by asking for it in a command, i.e.:

python detect.py --weights yolov5s6.pt  # auto-download YOLOv5s P6 model

@glenn-jocher
Copy link
Member Author

@WANGCHAO1996 yes the YAML files are easy to modify. You can remove (and add) outputs by simply changing the Detect() module inputs here. Inputs 17, 20 and 23 correspond to P3, P4 and P5 grids. Remember if you modify the number of layers here you should also modify your anchors correspondingly, or simply delete the anchors and replace them with anchors: 3 (to tell autoanchor to compute 3 anchors for each output layer).

[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)

@laurenssam
Copy link

Thanks, but how do I know the resolution the model was trained on?

@glenn-jocher
Copy link
Member Author

@laurenssam the P6 models were trained on 1280 by default, except for the ones denoted by -640 (to get an apples to apples comparison with the current models).

For training on large images, you can see our xview repo here. The general concept is to train on smaller 'chips' at native resolution, and then run inference either at native resolution if you can, or else use a sliding window that's stitched togethor later.
https://github.com/ultralytics/xview-yolov3

@WANGCHAO1996
Copy link

WANGCHAO1996 commented Feb 4, 2021

@WANGCHAO1996 yes the YAML files are easy to modify. You can remove (and add) outputs by simply changing the Detect() module inputs here. Inputs 17, 20 and 23 correspond to P3, P4 and P5 grids. Remember if you modify the number of layers here you should also modify your anchors correspondingly, or simply delete the anchors and replace them with anchors: 3 (to tell autoanchor to compute 3 anchors for each output layer).

[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)

parameters

nc: 1 # number of classes
depth_multiple: 1.33 # model depth multiple
width_multiple: 1.25 # layer channel multiple

anchors

anchors: 3

YOLOv5 backbone

backbone:

[from, number, module, args]

[ [ -1, 1, Focus, [ 64, 3 ] ], # 0-P1/2
[ -1, 1, Conv, [ 128, 3, 2 ] ], # 1-P2/4
[ -1, 3, C3, [ 128 ] ],
[ -1, 1, Conv, [ 256, 3, 2 ] ], # 3-P3/8
[ -1, 9, C3, [ 256 ] ],
[ -1, 1, Conv, [ 512, 3, 2 ] ], # 5-P4/16
[ -1, 9, C3, [ 512 ] ],
[ -1, 1, Conv, [ 1024, 3, 2 ] ], # 7-P5/32
[ -1, 1, SPP, [ 1024, [ 5, 9, 13 ] ] ],
[ -1, 3, C3, [ 1024, False ] ], # 9
]

YOLOv5 head

head:
[ [ -1, 1, Conv, [ 512, 1, 1 ] ],
[ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
[ [ -1, 6 ], 1, Concat, [ 1 ] ], # cat backbone P4
[ -1, 3, C3, [ 512, False ] ], # 13

[ -1, 1, Conv, [ 256, 1, 1 ] ],
[ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
[ [ -1, 4 ], 1, Concat, [ 1 ] ],  # cat backbone P3
[ -1, 3, C3, [ 256, False ] ],  # 17 (P3/8-small)

[ -1, 1, Conv, [ 128, 1, 1 ] ],
[ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
[ [ -1, 2 ], 1, Concat, [ 1 ] ],  # cat backbone P2
[ -1, 1, C3, [ 128, False ] ],  # 21 (P2/4-xsmall)

[ -1, 1, Conv, [ 128, 3, 2 ] ],
[ [ -1, 18 ], 1, Concat, [ 1 ] ],  # cat head P3
[ -1, 3, C3, [ 256, False ] ],  # 24 (P3/8-small)

[ -1, 1, Conv, [ 256, 3, 2 ] ],
[ [ -1, 14 ], 1, Concat, [ 1 ] ],  # cat head P4
[ -1, 3, C3, [ 512, False ] ],  # 27 (P4/16-medium)

[ -1, 1, Conv, [ 512, 3, 2 ] ],
[ [ -1, 10 ], 1, Concat, [ 1 ] ],  # cat head P5
[ -1, 3, C3, [ 1024, False ] ],  # 30 (P5/32-large)

[ [ 21, 24, 27 ], 1, Detect, [ nc, anchors ] ],  # Detect(P2, P3, P4)

]
Thank you very much! Is that right? @glenn-jocher

@glenn-jocher
Copy link
Member Author

@WANGCHAO1996 I don't know what you are asking and your post is poorly formatted. Use ``` for code sections.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Mar 7, 2021
@Shaotran
Copy link

Shaotran commented Mar 7, 2021

Hi Glenn -- two questions for you:
I have the x6.pt weights file downloaded, but is there a --cfg yolov5x6.yaml model file to go along with it for training? Or is it still supposed to go with yolov5x.yaml?

You mention that the x6 model adds an additional layer "for extra-large objects," but also that it seems to perform better than the normal x model "under all conditions." Of course, I'll have to experiment, but just to understand your underlying intentions while creating the model -- would x6 still work for large images (1000px+) with "small" objects to detect (~100px+)? Thanks!

@github-actions github-actions bot removed the Stale Stale and schedule for closing soon label Mar 8, 2021
@glenn-jocher
Copy link
Member Author

@Shaotran all models contain their yaml files as attributes:

model = torch.load(...
model.yaml

The P6 models perform better on COCO than their P5 counterparts.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 8, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Apr 8, 2021
@ThuyHoang9001
Copy link

ThuyHoang9001 commented Oct 14, 2021

@glenn-jocher Hello Jocher, in the case that I want to convert to trt ,I wonder what is size of input( 640 or 1280) should be used for detecting yolov5s6 model?

@glenn-jocher
Copy link
Member Author

@ThuyHoang9001 P6 models will get best results at --img 1280, and can also be used all the way down to --img 64 with decreasing levels of accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests

5 participants