CVPR 2021 - Streaming Perception Challenge #2402
Replies: 3 comments
-
@karthiksharma98 thanks for starting this discussion thread for the CVPR 2021 - Streaming Perception Challenge! I see there is a PR #2400 as well to provide autodownload support for the Argoverse-HD datatset. Looking at the PR, commands to start training YOLOv5 on a 16 GB GPU on the dataset might be: $ python train.py --data argoverse_hd.yaml.yaml --weights yolov5s.pt --batch-size 64 --img 640
yolov5m 40
yolov5l 24
yolov5x 16 |
Beta Was this translation helpful? Give feedback.
-
I see the Argoverse-HD dataset is labelled as HD, so it may benefit from training at higher $ python train.py --data argoverse_hd.yaml.yaml --weights yolov5s6.pt --batch-size 16 --img 1280
yolov5m6 10
yolov5l6 6
yolov5x6 4 |
Beta Was this translation helpful? Give feedback.
-
I've trained a few models on the Argoverse-HD dataset to understand how to help people get the most out of it. All of these runs are logged to W&B at https://wandb.ai/glenn-jocher/argoverse. Best results were achieved by finetuning a YOLOv5x6 model at 1920 resolution using the argo branch, and hyperparameters closely related to hyp.finetune.yaml.
ResultsResults from treating this as a completely image-based (rather than sequence based dataset) are below. I think that clearly due to the sequence based nature of the data a better solution might be LSTM, Tracker (Kalman, Deepsort), or time-series based. LabelsThis is a difficult dataset due mainly to the small sizes of the objects relative to the large size of the images (1920x1200). That said though, most of the small objects lie on the center horizontal plain, i.e. the horizon, a region naturally suitable for higher resolution inference. You can see all of this in labels.png, and you can see in labels_correlogram.png that the images are video frames as the object centers track across the plots. You can also observe severe class imbalance of course, reflecting real world distribution of these objects. Hyperparametershyp.finetune.yaml worked much better when starting from COCO trained weights than hyp.scratch.yaml ConfusionMost commonly confused classes were car-truck and truck-bus, which makes sense. All others had confusion rates < 2%. PR CurveBicycle and Motorcycle were the most difficult classes with the least recall, probably due to a combination of low representation in the dataset and their small size. ImprovementClearly targeting the center horizontal plane for super resolution inference (i.e. 2x native) would be a good idea, possibly as part of a TTA (Test Time Augmentation) strategy. Also introducing video-based strategies (LSTM, tracking) might introduce great gains as well, but these are outside the scope of this simple training. |
Beta Was this translation helpful? Give feedback.
-
We've just launched the first “Streaming Perception Challenge” at the Workshop on Autonomous Driving (WAD) in conjunction with CVPR 2021. The challenge is hosted by a team from CMU & UIUC and includes two tracks for streaming object detection: detection-only and full-stack (detection + tracking + forecasting). This challenge is to foster research in the area of Streaming Perception, which has garnered a lot of research interest after the paper “Towards Streaming Perception” was published last year. It received a Best Paper Honorable Mention at ECCV 2020. Unlike most existing challenges, algorithm latency will be scored together with accuracy in a coherent manner. More details can be found on the challenge website. The total prize pool is $2700.
Please consider attending! If you have any questions, please feel free to contact us on the email address given on the website.
The project webpage can be found here: http://www.cs.cmu.edu/~mengtial/proj/streaming/. We recommended new participants to check this out first.
Beta Was this translation helpful? Give feedback.
All reactions