CVPR 2021 - Streaming Perception Challenge #2402

karthiksharma98 · 2021-03-09T01:07:30Z

karthiksharma98
Mar 9, 2021

We've just launched the first “Streaming Perception Challenge” at the Workshop on Autonomous Driving (WAD) in conjunction with CVPR 2021. The challenge is hosted by a team from CMU & UIUC and includes two tracks for streaming object detection: detection-only and full-stack (detection + tracking + forecasting). This challenge is to foster research in the area of Streaming Perception, which has garnered a lot of research interest after the paper “Towards Streaming Perception” was published last year. It received a Best Paper Honorable Mention at ECCV 2020. Unlike most existing challenges, algorithm latency will be scored together with accuracy in a coherent manner. More details can be found on the challenge website. The total prize pool is $2700.

Please consider attending! If you have any questions, please feel free to contact us on the email address given on the website.

The project webpage can be found here: http://www.cs.cmu.edu/~mengtial/proj/streaming/. We recommended new participants to check this out first.

glenn-jocher · 2021-03-09T03:56:58Z

glenn-jocher
Mar 9, 2021
Maintainer

@karthiksharma98 thanks for starting this discussion thread for the CVPR 2021 - Streaming Perception Challenge!

I see there is a PR #2400 as well to provide autodownload support for the Argoverse-HD datatset. Looking at the PR, commands to start training YOLOv5 on a 16 GB GPU on the dataset might be:

$ python train.py --data argoverse_hd.yaml.yaml --weights yolov5s.pt --batch-size 64 --img 640
                                                          yolov5m                 40
                                                          yolov5l                 24
                                                          yolov5x                 16

0 replies

glenn-jocher · 2021-03-09T04:09:04Z

glenn-jocher
Mar 9, 2021
Maintainer

I see the Argoverse-HD dataset is labelled as HD, so it may benefit from training at higher --img-size, and may also benefit from starting from the pretrained P6 models which were trained at --img 1280. To start training from P6 models at --img 1280 you can use the following commands (assuming a 16 GB GPU). Note all P6 models (YOLOv5s6/m6/l6/x6) autodownload on first use just like the P5 models (YOLOv5s/m/l/x).

$ python train.py --data argoverse_hd.yaml.yaml --weights yolov5s6.pt --batch-size 16 --img 1280
                                                          yolov5m6                 10
                                                          yolov5l6                 6
                                                          yolov5x6                 4

0 replies

glenn-jocher · 2021-03-12T17:45:29Z

glenn-jocher
Mar 12, 2021
Maintainer

I've trained a few models on the Argoverse-HD dataset to understand how to help people get the most out of it. All of these runs are logged to W&B at https://wandb.ai/glenn-jocher/argoverse.

Best results were achieved by finetuning a YOLOv5x6 model at 1920 resolution using the argo branch, and hyperparameters closely related to hyp.finetune.yaml.

train.py --batch 24 --weights yolov5x6.pt --data argoverse_hd.yaml --epochs 20 --img 1920 --hyp hyp.fa5.yaml --project argoverse --name yolov5x6-1920-fa5 --device 0,1,2,3,4,5,6,7 --workers 24

Results

Results from treating this as a completely image-based (rather than sequence based dataset) are below. I think that clearly due to the sequence based nature of the data a better solution might be LSTM, Tracker (Kalman, Deepsort), or time-series based.

Labels

This is a difficult dataset due mainly to the small sizes of the objects relative to the large size of the images (1920x1200). That said though, most of the small objects lie on the center horizontal plain, i.e. the horizon, a region naturally suitable for higher resolution inference. You can see all of this in labels.png, and you can see in labels_correlogram.png that the images are video frames as the object centers track across the plots. You can also observe severe class imbalance of course, reflecting real world distribution of these objects.

Hyperparameters

hyp.finetune.yaml worked much better when starting from COCO trained weights than hyp.scratch.yaml

Confusion

Most commonly confused classes were car-truck and truck-bus, which makes sense. All others had confusion rates < 2%.

PR Curve

Bicycle and Motorcycle were the most difficult classes with the least recall, probably due to a combination of low representation in the dataset and their small size.

Improvement

Clearly targeting the center horizontal plane for super resolution inference (i.e. 2x native) would be a good idea, possibly as part of a TTA (Test Time Augmentation) strategy. Also introducing video-based strategies (LSTM, tracking) might introduce great gains as well, but these are outside the scope of this simple training.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CVPR 2021 - Streaming Perception Challenge #2402

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

CVPR 2021 - Streaming Perception Challenge #2402

karthiksharma98 Mar 9, 2021

Replies: 3 comments

glenn-jocher Mar 9, 2021 Maintainer

glenn-jocher Mar 9, 2021 Maintainer

glenn-jocher Mar 12, 2021 Maintainer

Results

Labels

Hyperparameters

Confusion

PR Curve

Improvement

karthiksharma98
Mar 9, 2021

glenn-jocher
Mar 9, 2021
Maintainer

glenn-jocher
Mar 9, 2021
Maintainer

glenn-jocher
Mar 12, 2021
Maintainer